In article <beetle-0911950936310001 at bembidion.agforbes.arizona.edu>,
beetle at ag.arizona.edu (David Maddison) wrote:
> I am involved in an analysis of some sequences, and it
> is unclear if all of them really do share a history.
> That is, some of them may not actually be part of the
> same phylogenetic tree, and may represent independent
> derivations of the same function.
>> I'd like to know what literature is out there that
> deals with methods of testing to see if sequences
> really share a phylogenetic history. What papers
> are people aware of on this issue?
>> Thanks,
> David
>> --
> David R. Maddison
> Department of Entomology
> University of Arizona
> Tucson, AZ 85721
>>beetle at ag.arizona.edu
Hi David,
It sounds as if you want to test whether estimates from different data
partitions (e.g., genes) are significantly different (more different than
would be expected by stochastic variation). There are a couple of tests
available that might prove acceptable.
I assume that you have aligned sequences:
Species_1 partition_1 partition_2 ... partition_n
Species_2 partition_1 partition_2 ... partition_n
Species_3 partition_1 partition_2 ... partition_n
.
.
.
Species_s partition_1 partition_2 ... partition_n
Jim Bull and I (Huelsenbeck and Bull, Systematic Biology, in press)
propose a likelihood-ratio test (the likelihood heterogeneity test)
to evaluate the hypothesis that differences in phylogenetic estimates
can be explained by stochastic variation. In our application, we
specifically test for heterogeneity in topology (branching order)
but the test is trivially modified to evaluate other aspects of the
phylogenetic model. The likelihood heterogeneity test compares the
likelihood (L0) obtained under the constraint that the same phylogeny
underlies all of the data sets to the likelihood (L1) obtained when
this constraint is relaxed. Under the null hypothesis, H0, the same
tree is assumed to underlie the data from different genes, although
the rates of evolution as well as other parameters are allowed to vary
between the genes. Not only are the overall rates (for the genes as
wholes) allowed to vary, but the relative rates (from branch to branch
of the trees) can also differ among genes. Under the alternative
hypothesis, H1, different trees as well as evolutionary rates can
underlie each gene. The likelihood ratio test statistic is
d = 2(ln L1 ln L0).
Because the null hypothesis is a subset of the alternative
hypothesis, this ratio should be asymptotically distributed as a Chi
square probability density distribution with (n m) degrees of
freedom, where n is the number of parameters under H1 and m is the
number of parameters under H0 (Rice, 1995). However, Goldman (1993)
has shown that for the phylogeny problem, the Chi square distribution
is not appropriate, and instead suggested Markov simulation of the null
distribution to determine the critical values for d. In the absence
of suitable asymptotic results appropriate for all parameter values
under the null hypothesis, the maximum likelihood values are instead
used in the simulations. The simulations thus assume the same tree
for all genes but different branch lengths (and other parameter values)
among data partitions.
I've done some very limited simulations, and it seems that the parametric
bootstrap approach does a good job of generating the null distribution.
Jim and I have also applied the method to the problem of amniote relationships.
Farris et al. (Cladistics, 1995) also proposed a test that addresses the
same problem using, of course, parsimony as the optimality criterion. They
use as the test statistic the Michevich-Farris index:
MF = Lcombined - Sum_over_all_partitions(Li)
where L is the length of the tree for either the combined data or for the
i-th data partition. They propose that the null distribution for this
test statistic be determined by constructing new data partitions of the
same size randomly and without replacement. Swofford also proposed this
resampling scheme to me several years earlier and has implemented the
method in PAUP* 4.0 (as the combinability test). You might want to talk
with him. The advantage of this test is that it can be applied to both
molecular and morphological data. The disadvantage appears to be power.
I hope this is all helpful.
John Huelsenbeck
Department of Integrative Biology
University of California
Berkeley, CA 94720
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
John & Edna Huelsenbeck
johnh at mws4.biol.berkeley.eduednah at mws4.biol.berkeley.eduhttp://mw511.biol.berkeley.edu/john/edna.html
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *