Treeing partial sequences

Mary K. Kuhner mkkuhner at phylo.genetics.washington.edu
Tue Mar 9 10:34:45 EST 1993


In article <1993Mar8.190355.21445 at husc3.harvard.edu> robison1 at husc10.harvard.edu (Keith Robison) writes:

>I would like to make a tree for a gene family, but the problem is that
>the known members of the family are a mix of complete sequences,
>N-termini, and C-termini.  I don't need things to be perfect, just
>a decent guess at the phylogeny.  What is the best way to go about this?

>Keith Robison
>Harvard University
>Department of Cellular and Developmental Biology
>Department of Genetics / HHMI

>robison at biosun.harvard.edu 

Two possible ideas (I certainly won't claim to know the best way):

1.  Use a distance method such as neighbor-joining, and calculate
distances between sequences as percent difference--you can calculate
this even with sequences of different lengths, though it will not be as
accurate as it would with full sequences, especially if variability is
not randomly distributed across your gene.

2.  Make a number of subtrees (using whatever method you would normally
prefer)--for example, one that includes every sequence for which you
have C-terminus data, truncating the long ones, and another that
includes every sequence for which you have N-terminus data.  Then
combine the subtrees using a consensus tree method.

In either case, the resulting phylogeny should not be taken very
seriously, as in my experience there is not enough information in a gene
fragment to make a good phylogeny estimate.  At best it's a rough
sketch.

Mary Kuhner
Department of Genetics
University of Washington
mkkuhner at genetics.washington.edu



More information about the Mol-evol mailing list