multiple instances of the same sequence

Mary K.Kuhner mkkuhner at kingman.genetics.washington.edu
Mon Aug 13 18:16:48 EST 2001


In article <9l9hk6$qko$1 at mercury.hgmp.mrc.ac.uk>,  <rodgers at onramp.net> wrote:
>I've generated a database of exon2 sequences from MHC class I genes,
>and am adding to it new sequences from a cloning project which picks
>up sequences from more than a single locus.  If I enter a new sequence
>that is identical to one already in the database, what effect does
>that have on the tree generated?   That is, will the calculations be
>biased if multiple instances of the same or nearly identical sequenes
>are entered?
>-John Rodgers

You don't say how you are generating your tree, but most tree
inference programs shouldn't care--the two identical sequences will
just group tightly together, and  should not perturb the rest of
the tree.

However, if you are putting your data into some program that attempts to
infer something from the shape of the tree the spurious duplicate tips
might bias your results.

If you get too many duplicates your tree inference will slow down
tremendously, so it might be worthwhile to go through the database and
clean them out.  One simple approach would be to use a program that
computes distances between sequences, and look for zeroes.

Mary Kuhner mkkuhner at genetics.washington.edu

---





More information about the Mol-evol mailing list