Making alignments

Guy A. Hoelzer hoelzer at med.unr.edu
Tue Jan 20 15:11:50 EST 1998


In article <6a2jj2$181 at net.bio.net>, Nick Goldman
<N.Goldman at gen.cam.ac.uk> wrote:

> James McInerney wrote:
> > Yes, I read this also and I thought it was a bizzare thing to say. 
Given that
> > convergences in base compositional terms (two very distant sequences
converge
> > on a similar base composition) are widespread and also that very quickly,
> > synonymously-degenerate third positions become saturated with
substitutions, I
> > cannot recommend the use of DNA sequences when these sequences can be
> > translated into proteins.  I wish that he had elaborated on this sentence

> Please recall that I was writing about phylogenetic estimation, and not
> about sequence alignment.

> I stand by my comment that DNA sequences contain more information:  you
> can translate DNA --> amino acids, but not vice versa without
> ambiguity.  Therefore, all other things being equal, it would be better
> to base analyses on the DNA sequences.

I agree that DNA sequences contain more variation than their corresponding
amino acid sequences but, as suggested by Dr. McInerney, the evolution of
that variation can erase phylogenetic information and produce homoplasy. 
Of course, homoplasy can mislead phylogeny estimation when it is mistaken
for synapomorphic similarity.

> If you are worried about (e.g.) saturation at third codon positions,
> then a well- developed model of DNA evolution should allow you to
> retreive some of whatever phylogenetic information IS left, whilst
> simultaneously incorporating the information from other positions and
> automatically adjusting the "weight" given to each in the appropriate
> manner.  [details omitted]

This seems to me to be an overly optimistic view of the value of maximum
likelihood approaches.  In general, maximum likelihood is vulnerable to
errors in the chosen model of evolution and in the parameter estimates
that are used.  Granted, there are ways to test which of a limited set of
models best explains the data and parameter values can be estimated (with
error) from the data.  However, it is a certainty that the perfectly
"correct" correct model and parameter values will never be tested.  The
magnitude of the resulting errors in phylogeny estimation remain unknown
in particular cases.  While it is possible that a thorough ML analysis
might usually yield a tree that is close to the correct one, I worry that
overly optimistic presentations of the power of ML will lead those less
familiar than Dr. Goldman with the limitations of ML to develop too much
confidence in those results.

> Phylogenetic analyses from amino acid sequences tend to be less
> well-developed, particularly in the models of sequence evolution they
> consider (typically, no rate heterogeneity amongst sites; less realistic
> description of replacement patterns).  There are exceptions, but far be
> it from ME to plug more recent work by Thorne, Goldman & Jones and
> Goldman, Thorne & Jones.

Plug away -- this is great work.  ML is a great tool and further
development of reasonable evolutionary models will be useful.  It is just
that nasty "grain of salt" that is too often missing from the ML
literature that bothers me. 

> Hoping this explains what I was trying to say without getting me into
> the usual flame wars,

This response is not intended to be inflamatory.  I am just trying to add
some balance to the discussion of ML.

-- 
Guy Hoelzer                              e-mail:  hoelzer at med.unr.edu
Department of Biology                    phone:   702-784-4860
University of Nevada Reno                fax:     702-784-1302
Reno, NV  89557




More information about the Mol-evol mailing list