Phylogeny on 1800 protein sequences (alignment length 1500 characters)

joe at removethispart.gs.washington.edu joe at removethispart.gs.washington.edu
Tue Dec 10 21:33:25 EST 2002


In article <48g7vu8ua2vgfgnms2tahatmaig7khsv4m at 4ax.com>,
Michael Spitzer  <professa at gmx.net> wrote:
>I'm about to infer a tree on approx. 1800 aligned protein sequences,
>with the lengt of the alignment being approx. 1500 characters.
...
>I tried PARBOOT (uses PHYLIP programs), but PHYLIP turned out to be a
>lot slower than CLUSTALW (in fact the distance matrix calculation
>step. A single CPU CLUSTALW job is nearly 20 times faster than ~20
>node PARBOOT job on our cluster).

PHYLIP, in its latest incarnation, could be made to be parallel, and is
somewhat faster than before.  In particular the Neighbor program has had its
NJ implementation speeded up -- we had programmed it inefficiently in the
3.5c version.  The latest (v3.6a3) alpha release version should have
Neighbor much faster on large cases like this.  Your bottleneck sounds like
it was the DNADist distance program.  I think that is now somewhat faster
than before, but it isn't going to be really quick.

To take advantage of parallel processing, simply run multiple bootstrapping
runs (being careful not to let them overwrite each others' output tree files).
Thus to do 100 replicates ona 4-processor machine, do four 25-replicate
runs (with different random number seeds), then take the resulting
output tree files and concatenate them.

-- 
Joe Felsenstein         joe at removethispart.gs.washington.edu
 Department of Genome Sciences, University of Washington,
 Box 357730, Seattle, WA 98195-7730 USA




More information about the Comp-bio mailing list