PARBOOT - parallel bootstrapping package available

Tim Littlejohn tim at megasun.BCH.UMontreal.CA
Wed Sep 28 17:11:39 EST 1994


PARBOOT - parallel bootstrapping package
----------------------------------------

WHAT IS PARBOOT?
^^^^^^^^^^^^^^^^
PARBOOT is a parallel implementation of the bootstrapping function 
available in many of the PHYLIP(1) phylogenetic analysis programs.
The programs in the PHYLIP package that support bootstrapping take
input files with multiple resampled data sets that could potentially
be processed independently (usually generated with the PHYLIP program
SEQBOOT).  Bootstrapping is traditionally performed using the PHYLIP
programs, however, so that the constituent programs of a bootstrap
process these blocks sequentially.  This is usually a time consuming
process, making analysis through the iterative bootstrapping method
impractical.

The PARBOOT application splits a multiple-dataset inputfile into its
independent datasets, and processes these datasets in parallel on multiple
hosts (or one host with multiple CPUs).  This means that the speed of a
bootstrap analysis can be approximately improved in proportion the number
and power of CPUs available to the user.  

For example, a resampled dataset in the file "infile" might contain 100 re-
samplings of the original data, as illustrated below (only three blocks
shown):

---- Start of file infile ----
    5   13
Alpha        ACCGGGTTTG GCA
Beta         AGGGGGTTTC CCA
Gamma        CGGTTTTTTC CCA
Delta        GGGAAATTTT TCG
Epsilon      GGGAAATTTC CCG
    5   13
Alpha        ACCGGGTTGG CCC
Beta         AGGGGGTTCC CCC
Gamma        CGGTTTTTCC CCC
Delta        GGGAAATTTT CCC
Epsilon      GGGAAATTCC CCC
    5   13
Alpha        AACCGGGCCC AAA
Beta         AAGGGCCCCC AAA
Gamma        CCGGTCCCCC AAA
Delta        GGGGATTCCC CCC
Epsilon      GGGGACCCCC CCC
(...)
---- End of file infile ----

Bootstrapping using a distance matrix method (the PHYLIP program "DNADIST")
could be performed using the "multiple dataset" option and specifying 100
data sets.  As this is the same as running "DNADIST" 100 times on 100 different 
files each containing one data set, DNADIST is a good candidate for parallel
breaking the analysis into smaller parts and performing this analysis on
multiple machines.

Once PARBOOT is properly installed and configured, invoking it by typing:

	 "parboot infile dnadist y"

will result in the infile being split into individual data sets and
each dataset being run on the specified hosts simultaneously. The results
are collated as each sub-analysis is completed.


Platforms/Operating Systems
^^^^^^^^^^^^^^^^^^^^^^^^^^^
PARBOOT requires the following:

   - Networked UNIX computers.
   - An account on all hosts that will be used.
   - Working "rsh" and "rcp" commands.
   - A perl interpreter on all hosts (version 4.0).
   - The PHYLIP package accessible on all hosts.

See the INSTALLATION, getting.perl and getting.phylip documents for more
information on installing parboot.


Obtaining PARBOOT
^^^^^^^^^^^^^^^^^
PARBOOT can be obtained by gopher by pointing your gopher client at:
  
        megasun.bch.umontreal.ca
  
  and selecting:

 -->  5.  Computational Molecular Biology- programs, documents, help/
   -->  2.  Phylogenetic analysis- programs, help, etc./
     -->  5.  Parboot/

or by anonymous ftp to:
  
        megasun.bch.umontreal.ca
  
Files can be retrieved from the /pub/parboot directory.


FURTHER INFORMATION
^^^^^^^^^^^^^^^^^^^
For more information about the parboot project, send email to the  
Informatics Division of the Organelle Genome Megasequencing Program
at the Universite de Montreal:
  
        ogmp at bch.umontreal.ca
  
All feedback welcome.
  

ACKNOWLEDGMENTS
^^^^^^^^^^^^^^^
This work was supported in part by the Medical Research Council, Canada
(grant No. SP-34), the Canadian Genome Analysis & Technology program
(grant No. GO-12323) and Sun Microsystems.


AUTHORS
^^^^^^^
Pierre Rioux,
Tim Littlejohn (Project Management),
Organelle Genome Megasequencing Program, Aug. 1994.


REFERENCES
^^^^^^^^^^
(1) Felsenstein, J.  1993.  PHYLIP (Phylogeny Inference Package) version 3.5c.
    Distributed by the author.  Department of Genetics, University of
    Washington, Seattle.


Copyright OGMP, 1994
ogmp at bch.umontreal.ca
-- 
==============================================================================
Tim Littlejohn

E-mail:     tim at bch.umontreal.ca  

Snail Mail: Departement de biochimie        Phone: (514) 343-6111, x5149
            Universite de Montreal          Fax:   (514) 343-2210 
            C.P. 6128, Centre-ville
            Montreal (Quebec), H3C 3J7
            CANADA
==============================================================================



More information about the Mol-evol mailing list