insertion/deletion:how to weigh in a DNA sequence matrix?

newsmgr at merrimack.edu newsmgr at merrimack.edu
Mon Nov 17 18:04:25 EST 1997


Relay-Version: ANU News - V6.2.0 06/23/97 OpenVMS AXP V6.2; site chasm
Path: chasm!cam-news-feed2.bbnplanet.com!news.bbnplanet.com!dilbert.whoi.edu!scott at whoi.edu
Newsgroups: bionet.molbio.evolution
Subject: Re: insertion/deletion:how to weigh in a DNA sequence matrix?
Message-ID: <3470B075.4F91 at evol5.mbl.edu>
From: "Andrew J. Roger" <roger at evol5.mbl.edu>
Date: Mon, 17 Nov 1997 17:00:35 -0400
Reply-To: roger at evol5.mbl.edu
References: <64b984$c8q at net.bio.net>
Organization: Marine Biological Laboratory
Lines: 68
NNTP-Posting-Host: PPP5.MBL.EDU
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01-C-MACOS8 (Macintosh; I; PPC)

Arlin Stoltzfus wrote:
> 
> Zhang Daming wrote:
> >
> > Dear colleagues,
> > I have a data matrix of ITS sequences in a tribe of Liliaceae, in which
> > several insertion/deletions, 4-16 bp long, appear informative (existed
> > in several taxa, seemingly as synapomorphis ). Unfortunately, when I
> > applied PAUP to them , aligned each single base involved as either "-"
> > or "?",  I found that  the ins/dels were apparently un-informative and
> > somewhat homoplasy. Comparing to base substitution, "structural
> > variations", like ins/del, inversion or translocation, would be much
> > rarer, and should be somehow weighed over other characters. I tried
> > weighing them in different quantity, and even treating them as extra
> > characters at the matrix before using PAUP program. But the results
> > looked not improved. Could anyone tell me how to deal with such
> > problem?
> 
> Its not safe simply to assume that indel characters are more informative
> than nucleotide characters.  One should reduce the weight of indels
> to the extent that there is greater variation in mutation rates
> for different specific indels than in mutation rates for different
> specific nucleotide substitutions.  On the other hand, one should
> increase the weight of indels to the extent that they are much
> rarer evolutionary changes than nucleotide substitutions.  That is,
> the weighting must be some point of balance between conflicting
> factors.  Unless one has a highly sophisticated model of evolution
> that allows one to calculate a weighting, or very extensive
> analyses of the consistency of indels with other characters that
> gives one an empirically estimated weighting, then one can't
> know where this point of balance lies.
> 

Just to add another comment.  One should not consider all events of
insertion and deletion to be occurring at the same rate.  It looks as
though long insertions and deletions are probably much rarer events than
short ones.  For an interesting study of the relative frequencies of
insertions and deletions of various sizes see:

Gu and Li, J. Mol. Evol. 40: 464-473 (1995)

In this paper they suggest that the frequencies of insertions and
deletions are described well by a power law:

Fk = Ck^-b

where Fk = the frequency of insertion, deletion with gap length k, 
b is the *power parameter* and C is some constant.

This paper looks at pseudogenes compared between humans and rodents and
finds that b is something between 1.7-1.9 (the value has more extreme
values when other kinds of data are considered -- extending from
1.6-2.3).

Thus it looks like the frequency of indels is inversely proportional to
roughly the square of their length.

Perhaps knowing this could allow one to build weighting schemes for
parsimony or one could build maximum likelihood models that take this
into account. One would just need a way of estimating the relative
frequency of particular nucleotide substitution relative to a particular
size class of indel (one could estimate the relative frequency of single
base indels -- which are most likely to be abundant and then scale the
rest of the events using the information provided by the power law
described above).  Surely someone has tried this?

Cheers
Andrew J. Roger




More information about the Mol-evol mailing list