heuristic searches

newsmgr at merrimack.edu newsmgr at merrimack.edu
Mon Apr 20 13:51:54 EST 1998

Relay-Version: ANU News - V6.2.0 06/23/97 OpenVMS AXP V6.2; site chasm
Path: chasm!tribune.meitca.com!bloom-beacon.mit.edu!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news.sprintlink.net!Sprint!newsfeed.internetmci.com!!news.infi.net!pm5-50.sfo.infi.net!user
Newsgroups: bionet.molbio.evolution
Subject: Re: heuristic searches
Message-ID: <harshman.diespamdie-1704980823130001 at pm5-50.sfo.infi.net>
From: harshman.diespamdie at sjm.infi.net (John Harshman)
Date: 17 Apr 1998 16:21:55 GMT
References: <6gvvqa$ndq at net.bio.net>
Distribution: world
Organization: InfiNet
Lines: 64
NNTP-Posting-Host: pm5-50.sfo.infi.net

In article <6gvvqa$ndq at net.bio.net>, Sikes <dss95002 at uconnvm.uconn.edu> wrote:

> Colleagues,
> >This method is advocated and used by lots of people, including the
> >Maddision brothers. I use it too, and I set the number even lower, usually
> >from 2 to 5. If you want to spend more time, it's better to add random
> >addition replicates than to increase the number of trees per replicate.
> This procedure may greatly speed the finding of the shortest tree length 
> for one's dataset but my concern is that once that tree length is found, 
> how confident can one be about finding ALL the trees of that length?
> I would guess that some fraction of the equally parsimonious trees would 
> be missed by not searching each island to completion and thus one's 
> consensus tree would be an inaccurate representation of the quality of 
> one's dataset.  Can someone refute this with data (rather than opinion)?

Two issues: 1) Will you find *all* MPTs by this method? 2) Is a consensus
tree from this method an accurate representation of the full set of MPTs?

1) The originator of the thread mentioned the way to find all the MPTs: as
a second step, use all the trees you found by this method as starting
trees and swap. You will find all the trees in all the islands for which
you have at least one tree.

2) Given that you have started with many random addition sequences, I
would suggest that you have a random sample of the treespace occupied by
MPTs, and that as you add random addition sequence replicates, the
consensus tree will converge on the consensus tree from all MPTs. I would
be worried only if the sample were biased in some way, and I don't see a
bias. Or rather, the only bias is that some islands may not be
represented, for which the cure is more replicates with few trees, not
finding more trees in each island. Sorry, no rigorous support for this

> It would appear that some careful, controlled experiments should be 
> performed to see what the drawbacks are (if there are any)...[as an aside 
> it is interesting to note the lack of studies addressing the problem of 
> large dataset tree searching in comparison to the number of studies 
> dealing with branch support questions for smaller datasets.

I think the seldom-expressed position of many people, including me, is
that the search for MPTs is much less important than the search for
strongly supported nodes. I don't particularly care if I even find all (or
any of) the MPTs as long as the differences between the trees I find and
the real MPTs are all in rearrangements among poorly supported nodes,
which I collapse anyway.

>  Of course, a 
> strict consensus of many mpts is a type of  branch support measure..]

 Yes it is, but a very poor one.


Just a couple more things:
   *Note the obvious spam-defeating modification
    to my address if you reply by email.
   *It's my belief that my posts are now making it into talk.origins
    But anyway: if you read this in somebody else's post but never
    saw my original, please tell me. If you got this in email and
    respond in TO, please retain this part of my sig.

More information about the Mol-evol mailing list