Treeing partial sequences

Mary K. Kuhner mkkuhner at
Tue Mar 9 10:34:45 EST 1993

In article <1993Mar8.190355.21445 at> robison1 at (Keith Robison) writes:

>I would like to make a tree for a gene family, but the problem is that
>the known members of the family are a mix of complete sequences,
>N-termini, and C-termini.  I don't need things to be perfect, just
>a decent guess at the phylogeny.  What is the best way to go about this?

>Keith Robison
>Harvard University
>Department of Cellular and Developmental Biology
>Department of Genetics / HHMI

>robison at 

Two possible ideas (I certainly won't claim to know the best way):

1.  Use a distance method such as neighbor-joining, and calculate
distances between sequences as percent difference--you can calculate
this even with sequences of different lengths, though it will not be as
accurate as it would with full sequences, especially if variability is
not randomly distributed across your gene.

2.  Make a number of subtrees (using whatever method you would normally
prefer)--for example, one that includes every sequence for which you
have C-terminus data, truncating the long ones, and another that
includes every sequence for which you have N-terminus data.  Then
combine the subtrees using a consensus tree method.

In either case, the resulting phylogeny should not be taken very
seriously, as in my experience there is not enough information in a gene
fragment to make a good phylogeny estimate.  At best it's a rough

Mary Kuhner
Department of Genetics
University of Washington
mkkuhner at

More information about the Mol-evol mailing list