Traditional dogma suggests that we should use protein sequences for
inferring relationships from molecular sequences in those instances when
the underlying DNA sequences might be suffering from convergence due to
The suggestion being that protein sequences suffer very little from
compositional convergence. I am wondering how true this is. If we
think about the classification of amino acids (aromatic, small polar
etc.) then there are only a limited number of _allowable_ substitutions
at any one site (I am of course using this term _allowable_ in a loose
way). In other words, the substitution space for a particular amino
acid is much smaller than 19 (20 including indels) other character states.
So, what about convergence in protein-coding sequences? Is it rampant?
Is it as extensive as (for instance) thermophilic convergence in
ribosomal RNA sequences?
In reality, if an aromatic amino acid is needed at a particular
location, then the replacement of phenylalanine by tryptophan or
tyrosine is much more likely and also the existence of homoplastic
changes for this site is probably more likely than at the nucleotide
level when there are four alternatives, rather than (_effectively_) two!
So, stepping off my soapbox for a second, does anybody agree with this
comment, or is it completely wrong? I have inferred amino acid
compositional trees and often it is possible to generate very different
trees on the basis of composition and on the basis of, say parsimony or
likelihood analysis of the characters. So there are homoplastic amino
acid compositional changes, it does exist. But, does it affect
Do we have any good studies of amino acid compositional convergence?
Protein similarity that is not due to recentness of common ancestry, but
rather due to compositional convergence (or parallelism, or reversal or
any homoplastic event you like to name)?
Any input is gratefully received.
Dr. James O. McInerney,
Dept. Biology, Dept. Zoology,
Natl. Univ. Ireland, The Natural History Museum,
Maynooth, and Cromwell road,
Co. Kildare, Ireland London SW7 5BD, UK.
Phone +353 1 708 3860 +44 171 938 9163
Fax +353 1 708 3845 +44 171 938 9158
email james.o.mcinerney at may.iej.mcinerney at nhm.ac.ukhttp://www.may.ie/academic/biology/jmbioinformatics.shtml