? about DNA alignment

Lukas Knecht knecht at inf.ethz.ch
Mon Jun 28 04:10:14 EST 1993


Hello!

In article 4374, Jeremy J. Ahouse writes:

>     I have done a series of multiple alignments.  The alignments were done
> with inferred amino acid sequences.  Now that I am happy with the
> alignments I want to go back to the mRNA sequence (which I have) for some
> of the clustering and parsimony analysis.  I want to enforce the alignments
> (gaps, etc...) from the aa's on the nucleotide alignment.

The computational biochemistry tool Darwin (being developed by the 
Computational Biochemistry Research Group at ETHZ) contains two functions
which do not exactly solve Jeremy's problem, but might be of interest here
(because Darwin has its own programming language, it would anyway be almost
trivial to write a function producing a multiple alignment of genes from a
multiple alignment of inferred proteins).

One function aligns two (homologous) nucleotide sequences coding for proteins
codonwise, thereby finding the reading frames, frame shifts and introns in
genes. So, a part of a typical nucleotide-nucleotide alignments looks like this:

                ..40....|..150....|...60....|...70....|...80....|...90....
________________GCTGTCTGCCGCCGACAAGACCAACGTCAAGGCCGCCTGGAGTAAGGTTGGCGGCCAC
                C><C><L><P>  <D><K><T><N><V><K><A><A><W><S><K><V>      <H>
                .  .  :  |    :  |  .  .  :  .  .  .  |  |  |  |        . 
                Y><T><M><P>  <N><K><A><L><I><T><G><F><W><S><K><V>      <K>
gcagctacacaaacagACACCATGCCG__AATAAGGCCCTAATCACCGGCTTCTGGAGCAAGGTG__(6)_AAA
60....|...70....|...80....|  ...90....|..900....|...10....|...20.      ...

(lowercase bases mark introns detected by the alignment)

The other function aligns a gene with a (homologous) protein. A sample output
might be: 

....|...40....|...50....|...60....|...70....|...80....|...90....|..100....
CCACCAGAGTGTCTTCAACTGCAACTTCTTCTATTTATCAGAAGCAACGTCGACCCACCTATTCATCATCAAAA
<P><P><E><C><L><Q><L><  P><S><S><I><Y><Q><K><Q><R><R><P><T><Y><S><S><S><K>
 |  |  .  .  |  .  |    |  |  |     .  .  .  .  .  .  |  .  |  |  :  :  .
 P  P  T  Y  L  P  L  __P  S  S  P  I  Y  S  P  P  P  P  V  Y  S  P  P  P
  
The binaries of Darwin are available for free for DECstation and SUN-Sparc
machines (for other machines, send mail to knecht at inf.ethz.ch). They are 
accompanied by a 200 page tutorial. If you are interested, send mail to
wertli at inf.ethz.ch.

------------------------------------------------------------------------
Lukas Knecht                                  e-mail: knecht at inf.ethz.ch
Institut fuer Wissenschaftliches Rechnen      phone:  +41 1 254 74 75
IFW D29.1                                     fax:    +41 1 262 39 73
ETH Zentrum
CH-8092 Zuerich
------------------------------------------------------------------------




More information about the Bio-soft mailing list