how to treat gaps in alignments for distance calculations?

joe at removethispart.gs.washington.edu joe at removethispart.gs.washington.edu
Fri Oct 25 18:35:32 EST 2002


In article <apccod$mpj$1 at mercury.hgmp.mrc.ac.uk>,
Tilman Lamparter  <lamparte at zedat.fu-berlin.de> wrote:
>How are gaps to be treated when aligned protein sequences are taken to
>obtain distance matrices? Should the regions be excised in all sequences?
>I use the Phylip protdist program with Jones-Taylor-Thornton model or
>Dayhoff PAM matrix. I always get different results when alignments with
>and without gaps are compared.

I get asked this question a lot.

(1) All modern parsimony, distance, and likelihood programs can cope with a
    gap.  So don't remove them.  but ...
(2) Almost no programs make use of the information provided by the presence or
    absence of the gap.  They just consider it missing data, as if you
    forgot to record the amino acids.  The exception is the growing but still
    not too useful statistical literature on models including insertions and
    deletions.
(3) However, even if you are not worried about that loss of information, in
    practice the regions with lots of gaps are also those that
    (a) tend to have higher rates of change, and
    (b) tend to be badly aligned.

Which means there are some arguments on each side of the issue.  A useful
and sophisticated solution to the tree alignment problem would go far to
alleviate these worries.  It is a Big Need in computational molecular biology.

-- 
Joe Felsenstein         joe at removethispart.gs.washington.edu
 Department of Genome Sciences, University of Washington,
 Box 357730, Seattle, WA 98195-7730 USA

---




More information about the Mol-evol mailing list