program for phylogenic analysis ?
hbadrane at pasteur.fr
Tue Oct 25 04:33:00 EST 1994
This file from "expasy" gopher is well:
Date: 22 Apr 1993 22:30:50 +0000 (GMT)
From: joe at GENETICS.WASHINGTON.EDU (Joe Felsenstein)
Subject: SOME AVAILABLE PHYLOGENY PROGRAMS (LONG)
SOME AVAILABLE PHYLOGENY PROGRAMS
The following material (except item 0) is from the PHYLIP version 3.5
documentation. I post it because it may be a useful compilation. Here are
some of the phylogeny packages that I know about. Some of them are available
over Internet from ftp server machines. If you are on Internet you should
familiarize yourself with ftp and with them (see entries 6 and 7 below for more
Table of Contents
0. PHYLIP 9. VOSTORG 16. COMPROB
1. PAUP 10. MEGA 17. MARKOV
2. BIOSYS-1 11. Evomony 18. PHYSYS
3. MacClade 12. COMPONENT 19. SINCAIDEN
4. Hennig86 13. Turbotree 20. MUST
5. ClaDOS 14. Molevol 21. GDE
6. TreeAlign 15. CLINCH 22. TreeTool
0. PHYLIP is a free package of programs for inferring phylogenies,
including programs to carry out parsimony, compatibility, distance matrix,
invariants ("evolutionary parsimony") and likelihood methods on a variety of
different kinds of data. It is available in the recently-released versions
3.5c and 3.5p as C or Pascal source code and documentation, and in four forms
of executables: (i) for 386 and 486 systems under PCDOS, (ii) for 386 and 486
systems under Windows, (iii) for non-386 and non-486 PCDOS systems, and (iv)
for Macintosh systems. The C source code will also compile easily on most
workstations and mainframes that have a C compiler. PHYLIP has been
distributed by me since 1980, with over 2000 registered installations. New
features include programs to compute protein sequence distances, to
interactively modify a phylogeny, and to compute likelihoods in coalescent
models from samples of genealogies. Most programs in the C version no longer
have arbitrary limits on the numbers of sites of or species. Many other new
features have been added as well, such as new models for variation of
evolutionary rates among sites in the DNA likelihood programs.
PHYLIP is available by anonymous ftp from
evolution.genetics.washington.edu (IP number 126.96.36.199) in directory
pub/phylip. Users who cannot get it this way can also send enough formatted
diskettes, which will be returned with the particular form of the package and
its documentation written on them. Contact me (preferably by electronic mail)
for details of the diskette distribution or further information about anonymous
ftp distribution. The latest version of PHYLIP is version 3.51 which fixes
some bugs present in 3.5.
1. David Swofford of the Laboratory of Molecular Systematics, National
Museum of Natural History, Smithsonian Instition, Washington, D.C. has written
PAUP (Phylogenetic Analysis Using Parsimony). It can be ordered from the
Center for Biodiversity, Illinois Natural History Survey, 607 East Peabody
Drive, Champaign, Illinois 61820, U.S.A.
Since December, 1985, Swofford has been distributing a precompiled
executable object-code versions of PAUP for the IBM PC and other MSDOS systems.
As of this writing (February, 1993) he has released version 3 (PAUP/Mac) for
the Macintosh, and later hopes to release version 3 for PCDOS systems and
ultimately for mainframes. The cost was $50, which will increase to $100 soon.
Orders received for the Mac version will be filled but the final printed
documentation will arrive later, as it is not completed yet.
PAUP 3.0 is probably the most sophisticated parsimony program. It allows
multistate characters, user-defined weights on individual state transitions,
Wagner, Camin-Sokal and Dollo parsimony methods, bootstrap confidence
intervals, and finding all most parsimonious trees by branch-and-bound. It
also has provision for computing Lake's linear phylogenetic invariants. PAUP
is (a great) many times faster than the parsimony programs in PHYLIP.
2. Swofford also distributes an older package of programs, BIOSYS-1,
including some phylogeny estimation programs, for use with gene frequency data,
with particular attention to distance methods. BIOSYS-1 is distributed on an
IBM PC-formatted floppy disk. Included are precompiled versions for the IBM PC
and source code for uploading to IBM, VAX/VMS, Unix, Prime and CDC mainframes
and minicomputers. The price is $25.00, from the same address as PAUP.
BIOSYS-2 is under development, but it is too early to anticipate a completion
3. If you have a Macintosh computer and any interest in discrete-state
parsimony methods (including DNA and protein parsimony), you should definitely
get MacClade. It was written by Wayne Maddison and David Maddison of the
University of Arizona. All distribution is by Sinauer Associates, Sunderland
Massachusetts 01375, USA. Their phone number is: (413) 665 3722, FAX: (413)
665 7292. A disk with program, help file, and example data files, plus book
(which has about 100 pages of intro to phylogenetic theory, and 250 pages of
program instructions), is $75 U.S. ($40 for the book alone). Site licenses
also available. An earlier and less capable Version 2 (which for example
cannot read nucleic acid sequences and has fewer features for discrete
characters) is also available by anonymous ftp from the EMBL, Indiana and
Houston molecular biology software servers. Their addresses are given below
under the descriptions of TreeAlign and ClustalV. MacClade 2.1 will be found
among their Mac software, as a squeezed and then binhexed file.
MacClade enables you to use the mouse-window interface to specify and
rearrange phylogenies by hand, and watch the number of character steps and the
distribution of states of a given character on the tree change as you do so.
MacClade is positively addictive and will give you a much better feel for the
tree and your data. It's the closest thing to a phylogeny video game that I
have seen. It has been influential in spurring the inclusion of interaction
and graphics into other phylogeny programs. (I have tried to supply this
functionality in PHYLIP by incorporating the programs MOVE, DOLMOVE, and
DNAMOVE, which act somewhat like MacClade). MacClade does not have a
sophisticated search algorithm to find best trees: it largely relies on you to
do it by hand (which is surprisingly effective), with only a local
rearrangement algorithm available to improve on that tree.
4. J. S. Farris has produced Hennig86, a fast parsimony program including
branch-and-bound search for most parsimonious trees and interactive tree
rearrangement. Although complete benchmarks have not been published it is said
to be faster than Swofford's PAUP; both are a great many times faster than the
parsimony programs in PHYLIP. The program is distributed in executable object
code only and costs $50, plus $5 mailing costs ($10 outside of of the U.S.).
The user's name should be stated, as copies are personalized as a copy-
protection measure. It is distributed by Arnold Kluge, Amphibians and
Reptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan
48109-1079, U.S.A. It runs on PC-compatible microcomputers with at least 512K
of RAM and needs no math coprocessor or graphics monitor. It can handle up to
180 taxa and 999 characters. An 80386 version, Hennig386, is currently being
tested but no release date has yet been announced.
5. ClaDOS, an interactive program which allows rearrangement of trees and
their evaluation, mapping of characters into them, and more, is available for
PCDOS systems from Kevin Nixon, L. H. Bailey Hortorium, Cornell University, 467
Mann Library, Ithaca, New York 14853. I have been unable to get information
on its cost or method of distribution.
6. Jotun Hein, (Institute of Genetics and Ecology, University of Aarhus,
8000 Aarhus C, Denmark) has produced TreeAlign, a multiple sequence alignment
program that builds trees as it aligns DNA or protein sequences. It uses a
combination of distance matrix and approximate parsimony methods. TreeAlign
uses too much memory for it to run on PC's (DOS or Mac systems) but is really
designed for a workstation or mainframe. It is available by anonymous ftp at
the Indiana, Houston, and EMBL molecular biology software distribution sites.
Their network addresses are respectively: ftp.bio.indiana.edu,
ftp.bchs.uh.edu, and ftp.embl-heidelberg.de. In the Indiana archive one must
enter directory molbio/align, in the Houston archive it is in directory
pub/gene-server in the directories unix and vms, and on the EMBL archive it is
in pub/software/unix and pub/software/vax. If you are on Internet and use
molecular data it is important that you learn to use anonymous ftp and become
familiar with these ftp servers.
7. Another multisequence alignment program that estimates trees as it
aligns multiple sequences is ClustalV. An older version in PCDOS executable
form was distributed previously (see below for information on how to get
executables for PC or Mac for the current version). Currently it is
distributed as C source code by its author, Desmond Higgins. Clustal was
originally developed at Trinity College, Dublin, Ireland, but version V was
done at Higgin's current address, the European Molecular Biology Laboratory,
Heidelberg, Germany. Clustal V successfully compiles and runs on VAX/VMS C,
Apple Macintosh Think C, MSDOS Turbo C, Decstation ULTRIX C,and Sun
workstations with GNU C. It is a complete rewrite and upgrade of the Clustal
package which was described by Higgins and Sharp (1989).
New features include the ability to detect read different input formats
(NBRF/ PIR, Fasta, EMBL/Swissprot); align old alignments; produce phylogenetic
trees after alignment (Neighbor Joining trees with a bootstrap option); write
different alignment formats (Clustal, NBRF/PIR, GCG, PHYLIP); full command line
The program is available by anonymous ftp at the Indiana, Houston, and
EMBL molecular biology distribution sites. Their network addresses are
respectively: ftp.bio.indiana.edu, ftp.bchs.uh.edu, and ftp.embl-heidelberg.de.
In the Indiana archive one must enter directory molbio/align, in the Houston
archive it is in directory pub/gene-server in all of the four directories dos,
Mac, unix, and vms, and on the EMBL archive it is in pub/software/unix or
pub/software/vax. If you are on Internet and use molecular data it is
important that you learn to use anonymous ftp and become familiar with one or
more of these ftp servers.
If you do not have any access to Internet, you could alternatively start
by sending e-mail to Des Higgins at:
higgins at EMBL-Heidelberg.DE (Internet)
If you do not have access to e-mail, send a formatted PC or MAC diskette
(PLEASE state which) to:
European Molecular Biology Laboratory
He will return the diskette with the source code and documentation. He can
also include an executable image for PC's or MAC.
8. Gary Olsen, of the Department of Microbiology, University of Illinois,
has developed a speeded-up version of my program DNAML coded in C, which he
calls "fastDNAml". It achieves a number of economies and also is organized so
that it can be run on parallel processors -- he and his co-workers have
constructed trees of very large size on a high-speed parallel processor. The
program can be compiled using the "p4" portable parallel processing toolkit.
It can also be run in ordinary serial mode on workstations where it is fatser
than DNAML. The C program is available by anonymous ftp from the Ribosomal
Database Project at info.mcs.anl.gov in directory pub/RDP/programs/fastDNAml.
9. Andrey A. Zharkikh, Andrey Rzhetsky, and their co-workers in the
Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of
Sciences, Novosibirsk, Russia, Ex-USSR, have produced VOSTORG, a package of
programs for alignment (both manual and automatic) and inferring phylogenies by
distance methods and parsimony for molecular sequences. It runs on IBM PC-
compatibles and includes some rather fancy graphics. The authors are currently
in the U.S., not in Siberia, and their program is sold for about $250 by Exeter
Software, 100 North Country Road, Setauket, NY 11733, USA. Their telephone
number is 1-800-842-5892; Fax (516)751-3435. The programs are described in a
paper by Zharkikh et. al. (1991).
10. MEGA (Molecular Evolutionary Genetic Analysis) is due to be released
at the beginning of 1993 by Sudhir Kumar, Koichiro Tamura, and Masatoshi Nei of
the Institute of Molecular Evolutionary Genetics, 328 Mueller Lab, Pennsylvania
State University, University Park, Pennsylvania 16802, U.S.A. It will be an
executable program for PCDOS machines, and will be menu-driven with context-
sensitive help. It will analyze data from DNA, RNA and protein sequences, and
distance matrices produced from other kinds of data as well. It will include
the Neighbor-Joining method distance matrix method, a branch and bound
parsimony method, and bootstrapping. It will also plot trees on many kinds of
printers. The program will be provided free of charge if you send one 1.2 Mb
5.25-inch or 1.44 Mb 3.5-inch floppy diskette, and will be sent as soon as it
is available. Inquiries can also be made by mail to M. Nei at the above
address or by electronic mail to nxm2 at psuvm (Bitnet) or nxm2 at psuvm.psu.edu
11. James Lake will soon distribute "Evomony", a program for using the
"evolutionary parsimony" (invariants) method for inferring phylogenies from DNA
or RNA sequences. It runs on 286 and 386 PCDOS systems with at least 500k
bytes of memory. Lake intends to distribute a PCDOS version by April 1, 1993
(his choice of date, not mine!), with a Macintosh version to follow in 1994.
Both will be distributed free to scientists in this field. Exact procedures
for ordering Evomony have not yet been announced. Lake's address is Department
of Biology, University of California, Los Angeles, California 90024.
12. Rod Page has written COMPONENT, a program for PCDOS systems for
comparing cladograms for use in phylogeny and biogeography studies. It has far
more features for biogeographic studies (such as comparing species and area
cladograms) than any other package. It runs on PCDOS 286 or 386 systems under
Windows 3.0 or higher. It will be released in the very near future. Its cost
will be "in the $50-$75 range", and it can be ordered from Rod Page at the
Department of Botany, Natural History Museum, Cromwell Road, London SW7 5BD,
U.K. His phone and fax numbers are respectively (071)-938 9068 and 9260, and
his e-mail address is R.Page at natural-history-museum.imperial.ac.uk or
rdp at nhm.ic.ac.uk.
13. David Penny (Department of Botany and Zoology, Massey University,
Palmerston North, New Zealand) has been offering for free distribution several
PCDOS programs, one a fast parsimony program, TurboTree. There are also two
others, Hadtree which computes expected frequencies of all possible
distributions of nucleotides among species, and Great Deluge, an approximate
search for the most parsimonious tree by a quasi-random method. He tells me
that funding exigiencies are such that he may soon have to start charging for
these. His electronic mail address is dpenny at massey.ac.nz.
14. Walter Fitch (Department of Ecology and Evolutionary Biology,
University of California, Irvine, California 92717, U.S.A.) has a package
"Molevol" available free (on receipt of an appropriate number of PCDOS
formatted floppy disks) with about 20 FORTRAN programs for not only estimating
trees by parsimony and distance methods but doing various other manipulations
of data that might be needed such as format interconversions and searching for
homology and secondary structure. They are available as FORTRAN source and/or
as PCDOS executables. The FORTRAN programs will also run on Sun workstations
(and probably others too, I would suspect). His electronic mail address is
wfitch at daedalus.bio.uci.edu.
15. Kent Fiala, now of SAS Institute, has written a compatibility (clique)
program, based on an earlier program written by Kent and George Estabrook.
Christopher Meacham has put the latest version of CLINCH (6.2), with Kent's
permission, as a self-extracting DOS archive on Jim Beach's TAXACOM fileserver,
huh.harvard.edu, for anonymous FTP. The self-extracting archive is
"CLINCH62.EXE" in directory /pub/software/clinch. This should be FTPed as a
binary file. CLINCH62.EXE is about 150 kb. When you run it, it will expand to
14 files requiring about 280 kb. The executable program is CLINCH.EXE.
Readme, documentation, sample input and output, and FORTRAN source code are
included. PC-CLINCH is probably the most sophisticated compatibility analysis
program. The Taxacom server, by the way, also has other material related to
botanical systematics, including flora information.
16. Christopher Meacham (Department of Integrative Biology, University of
California, Berkeley, California 94720, U.S.A.) produces COMPROB, a Pascal
program to compute probabilities that characters would be compatible at random,
thus telling us which clique is "most surprising". It is available for
anonymous ftp as a PCDOS executable from the Taxacom server (huh.harvard.edu)
in directory pub/mip.
17. The program MARKOV computes a distance measure between pairs of
nucleotide sequences. It also constructs phylogenies from these and summarizes
the 4x4 substitution matrices between the pairs of species. It uses a more
general model of substitution than used in PHYLIP, the Stationary Markov Model
described in the paper by Saccone et. al. in Methods in Enzymology volume 183,
pages 570-583, 1990. Bootstrapping is used to analyze the statistical error of
the results. Output files from CLUSTAL and PILEUP, as well as some other
formats, can be used for input, and analysis can be confined to certain codon
positions in coding sequences. The program is written in FORTRAN and runs on
VMS systems. It was produced by Dr. Graziano Pesole and Professor Cecilia
Saccone at the University of Bari, Italy, and is available (for free?) from Dr.
Cecilia Lanave at CSMME-CNR, Dipartimento di Biochimica e Biologia Molecolare,
Universita` di Bari, via Orabona 4, 70126 Bari, Italy. Her phone number is
39-80-243305, her fax number is 39-80-243317, and her e-mail address is
lanave at vaxba0.ba.it or mvx36 at ibacsata.it
18. J. S. Farris and Mary Mickevich earlier released a package of
phylogeny programs, PHYSYS, which, at about $5,000, was extremely expensive (in
my opinion, which is certainly a biased one). I am not sure whether, from
whom, or under what conditions it is still available.
19. Fujitsu Ltd. ("a $21 billion global leader in advanced computer,
telecommunications, and electronic devices") sells for $28,000 US a Fujitsu S
family workstation complete with a program, SINCAIDEN, which allows
"experimental researchers, even those unfamiliar with such analyses, [to]
easily create phylogenetic trees in their own laboratories." The program also
allows searches of the major nucleic acid sequence and protein databases (the
ad I saw does not make it clear whether these databases are provided with the
workstation). The methods available are UPGMA, neighbor-joining, Farris's
(Distance Wagner) and the modified Farris distance matrix methods. The
workstation is SPARC compatible and runs SunOS. The SYNCAIDEN program was
developed by the group at the National Institute of Genetics, Japan under Dr.
Takashi Gojobori. Fujitsu Ltd. may be contacted at 21-8, Nishi-Shinbashi 3-
chome, Minato-ku, Tokyo 105, Japan (phone 81-3-3437-5111 ext. 2831, fax 81-3-
5472-4354), or in the U.S. at Fujitsu America Inc., 3055 Orchard Drive, San
Jose, California 95134-2017 (phone 1-408-432-1300 ext. 5168, fax 1-408-434-
20. MUST, a package of sequence management programs, is distributed on a
shareware basis by Herve Phillippe, Laboratoire de Biologie Cellulaire (URA
CNRS 1134 D), Batiment 444, Universite de Paris-Sud, 91405 Orsay cedex, France.
His e-mail address is: adoutte at frciti51 on Bitnet/EARN. His phone and fax
numbers are respectively 188.8.131.52.64.81 and 184.108.40.206.21.30. MUST is
available on a shareware basis ($100 registration fee if you do not send
diskettes) and runs on PCDOS systems using PCDOS version 3 or later. It is
intended as complementary to existing phylogeny and alignment programs and can
produce output files in the formats of PHYLIP, PAUP, Hennig86, and CLUSTAL. It
contains a variety of sequence input, editing, checking, and storage functions,
as well as a sequence editor and a phylogeny plotter. It also allows further
analyses of the results from these phylogeny programs.
21. Steve Smith, formerly of the Harvard Genome Laboratory, has written
an X-Windows interactive sequence editor, GDE (Genetic Data Environment) which
allows the user to edit sequences and align them by hand, and to select subsets
of sites and sequences and call a variety of analysis proprams including
ClustalV and many of the PHYLIP 3.4 programs. The GDE 2.0 system will run on
many workstations that have the X windowing system. It also includes the
TreeTool tree-plotting program (see below). GDE 2.0 is free and is available
for anonymous ftp transfer at either at golgi.harvard.edu in directory
pub/GDE2.0 and also at ftp.bio.indiana.edu in directory molbio/unix/GDE.
22. Mike Maciukenas, at the Department of Microbiology of the University
of Illinois, has written a wonderful X-windows based interactive tree-plotting
program called TreeTool. It takes as input a PHYLIP tree file, with branch
lengths if they are provided, displays the tree in either rooted or unrooted
form on any X-windows screen, and allows the user to modify the form of the
tree and the placement of nodes and labels. When the tree is in final form the
user can have it written to a Postscript file and/or printed to a Postscript-
compatible printer. TreeTool is free as a C program for X windows and is
available for anonymous ftp from ftp.bio.indiana.edu in directory
molbio/unix/GDE. It is also included in the GDE 2.0 sequence analysis
environment mentioned above.
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
--> Internet: joe at genetics.washington.edu (IP No. 220.127.116.11)
Bitnet/EARN: felsenst at uwavm
More information about the Mol-evol