PARBOOT - parallel bootstrapping package
WHAT IS PARBOOT?
PARBOOT is a parallel implementation of the bootstrapping function
available in many of the PHYLIP(1) phylogenetic analysis programs.
The programs in the PHYLIP package that support bootstrapping take
input files with multiple resampled data sets that could potentially
be processed independently (usually generated with the PHYLIP program
SEQBOOT). Bootstrapping is traditionally performed using the PHYLIP
programs, however, so that the constituent programs of a bootstrap
process these blocks sequentially. This is usually a time consuming
process, making analysis through the iterative bootstrapping method
The PARBOOT application splits a multiple-dataset inputfile into its
independent datasets, and processes these datasets in parallel on multiple
hosts (or one host with multiple CPUs). This means that the speed of a
bootstrap analysis can be approximately improved in proportion the number
and power of CPUs available to the user.
For example, a resampled dataset in the file "infile" might contain 100 re-
samplings of the original data, as illustrated below (only three blocks
---- Start of file infile ----
Alpha ACCGGGTTTG GCA
Beta AGGGGGTTTC CCA
Gamma CGGTTTTTTC CCA
Delta GGGAAATTTT TCG
Epsilon GGGAAATTTC CCG
Alpha ACCGGGTTGG CCC
Beta AGGGGGTTCC CCC
Gamma CGGTTTTTCC CCC
Delta GGGAAATTTT CCC
Epsilon GGGAAATTCC CCC
Alpha AACCGGGCCC AAA
Beta AAGGGCCCCC AAA
Gamma CCGGTCCCCC AAA
Delta GGGGATTCCC CCC
Epsilon GGGGACCCCC CCC
---- End of file infile ----
Bootstrapping using a distance matrix method (the PHYLIP program "DNADIST")
could be performed using the "multiple dataset" option and specifying 100
data sets. As this is the same as running "DNADIST" 100 times on 100 different
files each containing one data set, DNADIST is a good candidate for parallel
breaking the analysis into smaller parts and performing this analysis on
Once PARBOOT is properly installed and configured, invoking it by typing:
"parboot infile dnadist y"
will result in the infile being split into individual data sets and
each dataset being run on the specified hosts simultaneously. The results
are collated as each sub-analysis is completed.
PARBOOT requires the following:
- Networked UNIX computers.
- An account on all hosts that will be used.
- Working "rsh" and "rcp" commands.
- A perl interpreter on all hosts (version 4.0).
- The PHYLIP package accessible on all hosts.
See the INSTALLATION, getting.perl and getting.phylip documents for more
information on installing parboot.
PARBOOT can be obtained by gopher by pointing your gopher client at:
--> 5. Computational Molecular Biology- programs, documents, help/
--> 2. Phylogenetic analysis- programs, help, etc./
--> 5. Parboot/
or by anonymous ftp to:
Files can be retrieved from the /pub/parboot directory.
For more information about the parboot project, send email to the
Informatics Division of the Organelle Genome Megasequencing Program
at the Universite de Montreal:
ogmp at bch.umontreal.ca
All feedback welcome.
This work was supported in part by the Medical Research Council, Canada
(grant No. SP-34), the Canadian Genome Analysis & Technology program
(grant No. GO-12323) and Sun Microsystems.
Tim Littlejohn (Project Management),
Organelle Genome Megasequencing Program, Aug. 1994.
(1) Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.
Distributed by the author. Department of Genetics, University of
Copyright OGMP, 1994
ogmp at bch.umontreal.ca
E-mail: tim at bch.umontreal.ca
Snail Mail: Departement de biochimie Phone: (514) 343-6111, x5149
Universite de Montreal Fax: (514) 343-2210
C.P. 6128, Centre-ville
Montreal (Quebec), H3C 3J7