James McInerney wrote:
> Dear all,
>> Traditional dogma suggests that we should use protein sequences for
> inferring relationships from molecular sequences in those instances when
> the underlying DNA sequences might be suffering from convergence due to
> mutational bias.
>> The suggestion being that protein sequences suffer very little from
> compositional convergence. I am wondering how true this is. If we
> think about the classification of amino acids (aromatic, small polar
> etc.) then there are only a limited number of _allowable_ substitutions
> at any one site (I am of course using this term _allowable_ in a loose
> way). In other words, the substitution space for a particular amino
> acid is much smaller than 19 (20 including indels) other character states.
>> So, what about convergence in protein-coding sequences? Is it rampant?
> Is it as extensive as (for instance) thermophilic convergence in
> ribosomal RNA sequences?
>> In reality, if an aromatic amino acid is needed at a particular
> location, then the replacement of phenylalanine by tryptophan or
> tyrosine is much more likely and also the existence of homoplastic
> changes for this site is probably more likely than at the nucleotide
> level when there are four alternatives, rather than (_effectively_) two!
>> So, stepping off my soapbox for a second, does anybody agree with this
> comment, or is it completely wrong? I have inferred amino acid
> compositional trees and often it is possible to generate very different
> trees on the basis of composition and on the basis of, say parsimony or
> likelihood analysis of the characters. So there are homoplastic amino
> acid compositional changes, it does exist. But, does it affect
> phylogeny reconstruction?
>> Do we have any good studies of amino acid compositional convergence?
> Protein similarity that is not due to recentness of common ancestry, but
> rather due to compositional convergence (or parallelism, or reversal or
> any homoplastic event you like to name)?
>> Any input is gratefully received.
James I worried about this problem just this past summer while I was making
protein sequence trees using highly diverged sequences. I did the following:
I computed both parsimony and distance trees on the aligned sequences where
indels were either removed or weighted with an A to W replacement value
(Dayhoff weights for the distance trees). Next the aligned sequences were
edited to convert the following clusters to a single amino acid, (I,L,V)
(F,Y,W) (S,T) (D,E) and (K,R) and the same sets of trees computed. Both data
sets gave very similar results, i.e. the same tree topologies and only minor
differences in relative branch lengths. I was also performing relative rate
tests using the distance matrix and obtained the same relative patterns (of
course, absolute distances were different). Actually, I was surprised by this
result because I went to all this trouble to document significant homoplasy
among the amino acid replacements.
> Dr. James O. McInerney,
> Dept. Biology, Dept. Zoology,
> Natl. Univ. Ireland, The Natural History Museum,
> Maynooth, and Cromwell road,
> Co. Kildare, Ireland London SW7 5BD, UK.
> Phone +353 1 708 3860 +44 171 938 9163
> Fax +353 1 708 3845 +44 171 938 9158
> email james.o.mcinerney at may.iej.mcinerney at nhm.ac.uk>http://www.may.ie/academic/biology/jmbioinformatics.shtml> ---