Specialist Sequence Alignment program

Peter M. Woollard pwoollar at hgmp.mrc.ac.uk
Tue Sep 30 05:14:35 EST 1997


Hi,
   One of our users has an interesting alignment program
and I wondered if any one new of potentially suitable program
for this.

Problem
=======

She wishes to locate the GAPPED complement of a sequence 
against a RELATIVELY UNGAPPED template sequence.

----------------------------
One way is to partially align two sequences, where:

For the source sequence, the alignment will require:
   Many gaps (often after every base!)
   Maximum gap size of 10.
 
For the template sequence to be aligned to, allow:
   Few gaps and maximum gap size of 2.

The program has to use the IUB na base codes.

(there are other would be nices, but I'd expect them to be far
  to specialised).
---------------------------- 

I have looked at various programs, but none seem to have 
the required parameters, does anyone know of a suitable program?
Something that works along the lines of GCG's bestfit program,
with extra parameters would be grand. 

Alternatively if someone can think of an alternative approach,
that would be good too; I considered regular expressions.


Background:
===========

RNA editing in a protozoan parasite is being investigated.
RNA editing is a post-transcriptional alteration of RNA sequence
to produce a protein coding mRNA. Uridine residues are added
and less often deleted. (It is complex!)

Guide RNA (gRNA) is used as the template for this editing.
The researcher has many minicircle sequences from a target organism
and wishes to locate and align gRNA in this sequences, knowing the 
sequences of RNA which are edited by the gRNA.

I have listed an example of RNA editing below my .signature
(copied from http://www.lifesci.ucla.edu/RNA/trypanosome/ga6.html)

There is more information in the following WWW link:
http://www.lifesci.ucla.edu/RNA/trypanosome/index.htm


Best Regards,
             Peter Woollard
 
----------------------------------------------------------------------
Bioinformatics,                    mailto:p.woollard at hgmp.mrc.ac.uk
UK MRC Human Genome Mapping Project  http://www.hgmp.mrc.ac.uk/
Resource Centre,                      Fax: ++44 (0)1223 494 512
Hinxton, Cambridge, CB10 1SB, UK      Tel: ++44 (0)1223 494 523
----------------------------------------------------------------------



5' pan-editing of L. tarentolae ATPase 6 (MURF4)
mRNA:

Top 3' is(are) the gRNA(s)  (guide RNA template)
The first  5' is the unedited RNA
The second 5' is the final edited RNA
Then the AA residues



                        3'-[U]U-UUU-UUaUagaUagagaagaUaAACgUUagaUCa
                              | ||| |||:|:|:|:|:||:||||||:|||:||||
5'..UAUAUAAAAAAUUAUAUCAGAUUAAGAUAAAUAA G   G        G UUG GA   AG 
5'..UAUAUAAAAAAUUAUAUCAGAUUAAGA AAA AAuGuuuGuuuuuuuuGuUUGuGAuuuAGu
                                     M  F  V  F  F  V  C  D  L  V  


   UUaAU-ACGCAUAAUAAUA-5'
   ||||| ||||||||
3'-[U]aU-AUGUaUaagaUgAUgUgaaaaCaaUaUCACAAACUAGGUCUC-5'
      || |:|:||||:||:||:|:|||||||||||||||||||||||||
                              3'-[U]CaUaaaCUAGGAUUUgaUaaaaaCaCaa
                                    ||:|||||||||:||:||||||||||||
   AA  AUUGCG A    A UA G     G  A AG G   GAUCCAGAA  A     G G  
   AAuuA UGCGuAuuuuAuUAuGuuuuuGuuAuAGuGuuuGAUCCAGAAuuAuuuuuGuGuu
   I  M   R  I  L  L  C  F  C  Y  S  V  W  S  R  I  I  F  V  L  


      UAAAAUAU--UACAAUAUAU-5'
      ||||||||  ||||| | ||
3'-[U]UgagaUgU--UaCaagggAUaUaaaUaUGGCUCAAUUACAAAUUG-5'
      |:|:||:|  |||||:::|||||||:||||||||||||||||
                               3'-[U]UaaUUaUagaaCaUagagaCUgUaaaUaa
                                     :|||||:|:||||||:|:|||:|||||||
      A    A AUUA G     UA A   G ACCGAG  AA G    G A     GA G   A   
      AuuuuAuA  AuGuuuuuUAuAuuuGuACCGAGuuAAuGuuuuGuAuuuuuGAuGuuuAuu
       F  Y  N    V  F  Y  I  C  U  E  L  M  F  C  I  F  D  V  Y  L  


      AUAAACAACCAAAUAUA-5'
      |||||||||||||||:|
3'-[U]aUagaUagUUagaUaCgUgCaaaUagaCagaUaCUAAGCACAAUAUA-5'
      |||:|:|:::|:||||:|:||||||:|||:|||||||||||||||
       A   G  GG   A G A G   A   G   A GAUUCGUGUUAUUUAAUUUUUAUGGAUU..3'
      uAuuuGuuGGuuuAuGuAuGuuuAuuuGuuuAuGAUUCGUGUUAUUUAAUUUUUAUGGAUU..3'
        F  V  G  L  C  M  F  I  C  L  W  F  V  L  F  N  F  Y  G  L




More information about the Bio-soft mailing list