Jim Garey asks about sequence alignments of rRNA
A few citations for using secondary structure information for the alignment
of rRNA primary sequences are
R. Gutell et al. Prog. Nucl. Acids Res. Mol. Biol. 32:155-216
Woese, C et al. 1983. Microbiol. Rev. 47:621-669.
Neefs et al. 1990. Nucl. Acids. Res. 18:2237-2317.
Essentially the way it is done (or at least the way I've done it) is to align
highly conserved regions of primary structure first. Then new sequences can
be overlayed onto the modeled secondary structure of a known sequence (such as
E. coli). Nucleotides at identical regions of the secondary structure can then
be aligned in the primary structure. Of course you can run into lots of
problems such as differences in secondary structure. For a simple example - if
you have three sequences -
1 2 3
A A A G A A
G G A G A G
A G A G A G
G-C G-C G-C
T-A A-T A-T
A-T A-T A-T
The position of the gap (due to the extra nucleotide in 3) may be somewhat
hard to determine.
1 ATGAGA-AGGCAT 1 ATGAGA-AGGCAT
2 AAGAAA-GGGCTT or 2 AAGAAAG-GGCTT
3 AAGAAAGAGGCTT 3 AAGAAAGAGGCTT
It gets worse with more sequences> Usually, regions like these are left out of
phylogenetic reconstructions because of the difficulty in the alignment. But
they do not constitute the majority of sites.
Jonathan A. Eisen
Department of Biological Sciences
Stanford, CA 94305-5020
jeisen at kimura.stanford.edu