Ed Rybicki in reference to viral phylogenies talked about pushing
"the probable evolutionary history back several hundred milion years."
I, too, have dabbled in molecular phylogenetic analysis of viral
sequences. In addition to a concern about how predictably the molecular
clock ticks, I am concerned about possibly overinterpreting trees
obtained by phylogenetic analysis. My concern is a general one, not
directed at Ed's work. His comments just served as the impetus for
writing this down.
As I see it, trees have two separable uses. For one they can be
used for taxonomic purposes, for saying which viruses are most closely
related to one another and how closely related groups of viruses are to
one another. This, of course, is very useful, since molecular
biological results obtained with one virus can often be extrapolated to
closely related viruses. I have no quibble with that.
The second use is to infer the evolutionary history of the viruses
we know today. For some viruses, usually isolates of the same virus,
the inferences are probably correct because there is additional
information supporting the view of direct common ancestry, such as
correlations with dates of isolation, geographic clustering or
commonalities in gene organization. For other comparisons, particularly
(but not limited to) those where nucleic acid sequences are of no use
and comparisons need to be made at the amino acid sequence level, I am
less sure. In some cases (cases where there are lots of OTU's to look
at), the data are consistent with a star topology, everything radiating
from one common ancestor sequence. One interpretation of the star
topology is that there are only certain sequence classes in sequence
space that are consistent with a successful infectious virus. If
viruses evolve, they jump from one of these "fitness" peaks to another.
Here is my concern. If we have a tree whose statistics suggest
that the star topology is not applicable, how can we be sure that the
topology implying a line of descent is not due to limited sampling? If
we gather more and more sequences of related viruses, might we not find
that our initial tree was erroneous and we are really dealing with a
star topology?
HIV-1 studies come to mind here. In recent years it has become
clear that there are at least eight subtypes of HIV-1. Within a
subtype, there appears to be good evidence for at least some descent
from common ancestors. The subtype branches, however, appear to emerge
at about the same time from an ancestral HIV-1. The question of how
this is explained was asked here recently and inadequately answered most
recently by Scutero. Is it possible that there are only a limited
number of regions of sequence space (corresponding to subtypes)
compatible with a successful HIV-1? This would mean that the sequence
space between subtypes consists of HIV-1's with poor "fitness". It
would also mean that the subtypes were generated by rare events of a
virus of one subtype jumping into another peak in sequence space.
Arguing against this are the observations of apparently successful
recombinants between subtypes.
I feel that I am at the limit of my knowledge and understanding.
Thus, I stop and would appreciate reading corrections and other
opinions.
Ed, see, you drew me out of my shell again! :-)
Ulrich Melcher umelcher at bmb-fs1.biochem.okstate.edu
Department of Biochemistry and Molecular Biology 246 NRC
Oklahoma State University
Stillwater OK 74078-3035 USA