IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

program for phylogenic analysis ?

Hassan BADRANE hbadrane at pasteur.fr
Tue Oct 25 04:33:00 EST 1994

This file from "expasy" gopher is well:

Date: 22 Apr 1993 22:30:50 +0000 (GMT)
From: joe at GENETICS.WASHINGTON.EDU (Joe Felsenstein)
     The following material (except item 0) is  from  the  PHYLIP  version  3.5
documentation.   I  post  it  because it may be a useful compilation.  Here are
some of the phylogeny packages that I know about.  Some of them  are  available
over  Internet  from  ftp  server  machines.  If you are on Internet you should
familiarize yourself with ftp and with them (see entries 6 and 7 below for more

                               Table of Contents
      0. PHYLIP              9. VOSTORG              16. COMPROB
      1. PAUP               10. MEGA                 17. MARKOV
      2. BIOSYS-1           11. Evomony              18. PHYSYS
      3. MacClade           12. COMPONENT            19. SINCAIDEN
      4. Hennig86           13. Turbotree            20. MUST
      5. ClaDOS             14. Molevol              21. GDE
      6. TreeAlign          15. CLINCH               22. TreeTool
      7. Clustal
     0.  PHYLIP is a  free  package  of  programs  for  inferring  phylogenies,
including  programs  to  carry  out  parsimony, compatibility, distance matrix,
invariants ("evolutionary parsimony") and likelihood methods on  a  variety  of
different  kinds  of  data.   It is available in the recently-released versions
3.5c and 3.5p as C or Pascal source code and documentation, and in  four  forms
of  executables:  (i) for 386 and 486 systems under PCDOS, (ii) for 386 and 486
systems under Windows, (iii) for non-386 and non-486 PCDOS  systems,  and  (iv)
for  Macintosh  systems.   The  C  source code will also compile easily on most
workstations  and  mainframes  that  have  a  C  compiler.    PHYLIP  has  been
distributed  by  me  since  1980, with over 2000 registered installations.  New
features  include  programs  to  compute   protein   sequence   distances,   to
interactively  modify  a  phylogeny,  and  to compute likelihoods in coalescent
models from samples of genealogies.  Most programs in the C version  no  longer
have  arbitrary  limits  on the numbers of sites of or species.  Many other new
features have been  added  as  well,  such  as  new  models  for  variation  of
evolutionary rates among sites in the DNA likelihood programs.
     PHYLIP      is      available      by       anonymous       ftp       from
evolution.genetics.washington.edu   (IP   number  in  directory
pub/phylip.  Users who cannot get it this way can also  send  enough  formatted
diskettes,  which  will be returned with the particular form of the package and
its documentation written on them.  Contact me (preferably by electronic  mail)
for details of the diskette distribution or further information about anonymous
ftp distribution.  The latest version of PHYLIP is  version  3.51  which  fixes
some bugs present in 3.5.
     1.  David Swofford of the Laboratory of  Molecular  Systematics,  National
Museum of Natural History, Smithsonian Instition, Washington, D.C.  has written
PAUP (Phylogenetic Analysis Using Parsimony).   It  can  be  ordered  from  the
Center  for  Biodiversity,  Illinois  Natural  History Survey, 607 East Peabody
Drive, Champaign, Illinois  61820, U.S.A.
     Since  December,  1985,  Swofford  has  been  distributing  a  precompiled
executable object-code versions of PAUP for the IBM PC and other MSDOS systems.
As of this writing (February, 1993) he has released version  3  (PAUP/Mac)  for
the  Macintosh,  and  later  hopes  to  release version 3 for PCDOS systems and
ultimately for mainframes.  The cost was $50, which will increase to $100 soon.
Orders  received  for  the  Mac  version  will  be filled but the final printed
documentation will arrive later, as it is not completed yet.
     PAUP 3.0 is probably the most sophisticated parsimony program.  It  allows
multistate  characters,  user-defined  weights on individual state transitions,
Wagner,  Camin-Sokal  and  Dollo  parsimony   methods,   bootstrap   confidence
intervals,  and  finding  all  most parsimonious trees by branch-and-bound.  It
also has provision for computing Lake's linear phylogenetic  invariants.   PAUP
is (a great) many times faster than the parsimony programs in PHYLIP.
     2. Swofford also distributes  an  older  package  of  programs,  BIOSYS-1,
including some phylogeny estimation programs, for use with gene frequency data,
with particular attention to distance methods.  BIOSYS-1 is distributed  on  an
IBM PC-formatted floppy disk.  Included are precompiled versions for the IBM PC
and source code for uploading to IBM, VAX/VMS, Unix, Prime and  CDC  mainframes
and  minicomputers.   The  price  is  $25.00,  from  the  same address as PAUP.
BIOSYS-2 is under development, but it is too early to anticipate  a  completion
     3.  If you have a Macintosh computer and any  interest  in  discrete-state
parsimony  methods (including DNA and protein parsimony), you should definitely
get MacClade.  It was written by Wayne  Maddison  and  David  Maddison  of  the
University  of  Arizona.  All distribution is by Sinauer Associates, Sunderland
Massachusetts 01375, USA.  Their phone number is: (413) 665  3722,  FAX:  (413)
665  7292.   A  disk with program, help file, and example data files, plus book
(which has about 100 pages of intro to phylogenetic theory, and  250  pages  of
program  instructions),  is  $75  U.S. ($40 for the book alone).  Site licenses
also available.   An earlier and less capable  Version  2  (which  for  example
cannot  read  nucleic  acid  sequences  and  has  fewer  features  for discrete
characters) is also available by anonymous  ftp  from  the  EMBL,  Indiana  and
Houston  molecular  biology  software servers.  Their addresses are given below
under the descriptions of TreeAlign and ClustalV.  MacClade 2.1 will  be  found
among their Mac software, as a squeezed and then binhexed file.
     MacClade enables you to use the  mouse-window  interface  to  specify  and
rearrange  phylogenies by hand, and watch the number of character steps and the
distribution of states of a given character on the tree change as  you  do  so.
MacClade  is  positively addictive and will give you a much better feel for the
tree and your data.  It's the closest thing to a phylogeny video  game  that  I
have  seen.   It  has been influential in spurring the inclusion of interaction
and graphics into other phylogeny programs.   (I  have  tried  to  supply  this
functionality  in  PHYLIP  by  incorporating  the  programs  MOVE, DOLMOVE, and
DNAMOVE,  which  act  somewhat  like  MacClade).   MacClade  does  not  have  a
sophisticated  search algorithm to find best trees: it largely relies on you to
do  it  by  hand  (which  is  surprisingly  effective),  with  only   a   local
rearrangement algorithm available to improve on that tree.
     4.  J. S. Farris has produced Hennig86, a fast parsimony program including
branch-and-bound  search  for  most  parsimonious  trees  and  interactive tree
rearrangement.  Although complete benchmarks have not been published it is said
to  be faster than Swofford's PAUP; both are a great many times faster than the
parsimony programs in PHYLIP.  The program is distributed in executable  object
code  only  and  costs $50, plus $5 mailing costs ($10 outside of of the U.S.).
The user's name should be  stated,  as  copies  are  personalized  as  a  copy-
protection  measure.   It  is  distributed  by  Arnold  Kluge,  Amphibians  and
Reptiles, Museum of  Zoology,  University  of  Michigan,  Ann  Arbor,  Michigan
48109-1079,  U.S.A.  It runs on PC-compatible microcomputers with at least 512K
of RAM and needs no math coprocessor or graphics monitor.  It can handle up  to
180  taxa  and 999 characters.  An 80386 version, Hennig386, is currently being
tested but no release date has yet been announced.
     5. ClaDOS, an interactive program which allows rearrangement of trees  and
their  evaluation,  mapping of characters into them, and more, is available for
PCDOS systems from Kevin Nixon, L. H. Bailey Hortorium, Cornell University, 467
Mann  Library,  Ithaca, New York  14853.  I have been unable to get information
on its cost or method of distribution.
     6. Jotun Hein, (Institute of Genetics and Ecology, University  of  Aarhus,
8000  Aarhus  C, Denmark) has produced TreeAlign, a multiple sequence alignment
program that builds trees as it aligns DNA or protein  sequences.   It  uses  a
combination  of  distance  matrix and approximate parsimony methods.  TreeAlign
uses too much memory for it to run on PC's (DOS or Mac systems) but  is  really
designed  for  a workstation or mainframe.  It is available by anonymous ftp at
the Indiana, Houston, and EMBL molecular biology software  distribution  sites.
Their    network    addresses    are    respectively:      ftp.bio.indiana.edu,
ftp.bchs.uh.edu, and ftp.embl-heidelberg.de.  In the Indiana archive  one  must
enter  directory  molbio/align,  in  the  Houston  archive  it  is in directory
pub/gene-server in the directories unix and vms, and on the EMBL archive it  is
in  pub/software/unix  and  pub/software/vax.   If  you are on Internet and use
molecular data it is important that you learn to use anonymous ftp  and  become
familiar with these ftp servers.
     7. Another multisequence alignment program  that  estimates  trees  as  it
aligns  multiple  sequences  is ClustalV.  An older version in PCDOS executable
form was distributed previously (see  below  for  information  on  how  to  get
executables  for  PC  or  Mac  for  the  current  version).   Currently  it  is
distributed as C source code by  its  author,  Desmond  Higgins.   Clustal  was
originally  developed  at  Trinity  College, Dublin, Ireland, but version V was
done at Higgin's current address, the European  Molecular  Biology  Laboratory,
Heidelberg,  Germany.   Clustal  V successfully compiles and runs on VAX/VMS C,
Apple  Macintosh  Think  C,  MSDOS  Turbo  C,  Decstation  ULTRIX   C,and   Sun
workstations  with  GNU C.  It is a complete rewrite and upgrade of the Clustal
package which was described by Higgins and Sharp (1989).
     New features include the ability to detect read  different  input  formats
(NBRF/  PIR, Fasta, EMBL/Swissprot); align old alignments; produce phylogenetic
trees after alignment (Neighbor Joining trees with a bootstrap  option);  write
different alignment formats (Clustal, NBRF/PIR, GCG, PHYLIP); full command line
     The program is available by anonymous ftp at  the  Indiana,  Houston,  and
EMBL  molecular  biology  distribution  sites.   Their  network  addresses  are
respectively: ftp.bio.indiana.edu, ftp.bchs.uh.edu, and ftp.embl-heidelberg.de.
In the Indiana archive one must enter directory molbio/align,  in  the  Houston
archive  it is in directory pub/gene-server in all of the four directories dos,
Mac, unix, and vms, and on the EMBL  archive  it  is  in  pub/software/unix  or
pub/software/vax.   If  you  are  on  Internet  and  use  molecular  data it is
important  that  you learn to use anonymous ftp and become familiar with one or
more of these ftp servers.
     If you do not have any access to Internet, you could  alternatively  start
by sending e-mail to Des Higgins at:
          higgins at EMBL-Heidelberg.DE     (Internet)
     If you do not have access to e-mail, send a formatted PC or  MAC  diskette
(PLEASE state which) to:
     Des Higgins
     European Molecular Biology Laboratory
     Postfach 10.2209
     Meyerhofstrasse 1
     6900 Heidelberg
He will return the diskette with the source code  and  documentation.   He  can
also include an executable image for PC's or MAC.
     8. Gary Olsen, of the Department of Microbiology, University of  Illinois,
has  developed  a  speeded-up  version of my program DNAML coded in C, which he
calls "fastDNAml".  It achieves a number of economies and also is organized  so
that  it  can  be  run  on  parallel  processors  -- he and his co-workers have
constructed trees of very large size on a high-speed parallel  processor.   The
program  can  be  compiled using the "p4" portable parallel processing toolkit.
It can also be run in ordinary serial mode on workstations where it  is  fatser
than  DNAML.   The  C  program is available by anonymous ftp from the Ribosomal
Database Project at info.mcs.anl.gov in directory pub/RDP/programs/fastDNAml.
     9. Andrey A. Zharkikh,  Andrey  Rzhetsky,  and  their  co-workers  in  the
Institute  of  Cytology and Genetics, Siberian Branch of the Russian Academy of
Sciences, Novosibirsk, Russia, Ex-USSR, have produced  VOSTORG,  a  package  of
programs for alignment (both manual and automatic) and inferring phylogenies by
distance methods and parsimony for molecular sequences.  It  runs  on  IBM  PC-
compatibles and includes some rather fancy graphics.  The authors are currently
in the U.S., not in Siberia, and their program is sold for about $250 by Exeter
Software,  100  North  Country  Road, Setauket, NY 11733, USA.  Their telephone
number is 1-800-842-5892; Fax (516)751-3435.  The programs are described  in  a
paper by Zharkikh et. al. (1991).
     10. MEGA (Molecular Evolutionary Genetic Analysis) is due to  be  released
at the beginning of 1993 by Sudhir Kumar, Koichiro Tamura, and Masatoshi Nei of
the Institute of Molecular Evolutionary Genetics, 328 Mueller Lab, Pennsylvania
State  University,  University  Park, Pennsylvania 16802, U.S.A.  It will be an
executable program for PCDOS machines, and will be  menu-driven  with  context-
sensitive  help.  It will analyze data from DNA, RNA and protein sequences, and
distance matrices produced from other kinds of data as well.  It  will  include
the  Neighbor-Joining  method  distance  matrix  method,  a  branch  and  bound
parsimony method, and bootstrapping.  It will also plot trees on many kinds  of
printers.   The  program will be provided free of charge if you send one 1.2 Mb
5.25-inch or 1.44 Mb 3.5-inch floppy diskette, and will be sent as soon  as  it
is  available.   Inquiries  can  also  be  made  by mail to M. Nei at the above
address or by electronic mail  to  nxm2 at psuvm  (Bitnet)  or  nxm2 at psuvm.psu.edu
     11.  James Lake will soon distribute "Evomony", a program  for  using  the
"evolutionary parsimony" (invariants) method for inferring phylogenies from DNA
or RNA sequences.  It runs on 286 and 386 PCDOS  systems  with  at  least  500k
bytes  of  memory.  Lake intends to distribute a PCDOS version by April 1, 1993
(his choice of date, not mine!), with a Macintosh version to  follow  in  1994.
Both  will  be  distributed free to scientists in this field.  Exact procedures
for ordering Evomony have not yet been announced.  Lake's address is Department
of Biology, University of California, Los Angeles, California  90024.
     12.  Rod Page has written COMPONENT,  a  program  for  PCDOS  systems  for
comparing cladograms for use in phylogeny and biogeography studies.  It has far
more features for biogeographic studies (such as  comparing  species  and  area
cladograms)  than any other package.  It runs on PCDOS 286 or 386 systems under
Windows 3.0 or higher.   It will be released in the very near future.  Its cost
will  be  "in  the  $50-$75  range", and it can be ordered from Rod Page at the
Department of Botany, Natural History Museum, Cromwell Road,  London  SW7  5BD,
U.K.   His  phone and fax numbers are respectively (071)-938 9068 and 9260, and
his   e-mail   address   is   R.Page at natural-history-museum.imperial.ac.uk   or
rdp at nhm.ic.ac.uk.
     13.  David Penny (Department of Botany  and  Zoology,  Massey  University,
Palmerston  North, New Zealand) has been offering for free distribution several
PCDOS programs, one a fast parsimony program, TurboTree.  There  are  also  two
others,   Hadtree   which   computes   expected  frequencies  of  all  possible
distributions of nucleotides among species, and Great  Deluge,  an  approximate
search  for  the  most parsimonious tree by a quasi-random method.  He tells me
that funding exigiencies are such that he may soon have to start  charging  for
these.  His electronic mail address is dpenny at massey.ac.nz.
     14.   Walter  Fitch  (Department  of  Ecology  and  Evolutionary  Biology,
University  of  California,  Irvine,  California   92717, U.S.A.) has a package
"Molevol" available  free  (on  receipt  of  an  appropriate  number  of  PCDOS
formatted  floppy disks) with about 20 FORTRAN programs for not only estimating
trees by parsimony and distance methods but doing various  other  manipulations
of  data that might be needed such as format interconversions and searching for
homology and secondary structure.  They are available as FORTRAN source  and/or
as  PCDOS  executables.  The FORTRAN programs will also run on Sun workstations
(and probably others too, I would suspect).  His  electronic  mail  address  is
wfitch at daedalus.bio.uci.edu.
     15. Kent Fiala, now of SAS Institute, has written a compatibility (clique)
program,  based  on  an  earlier  program written by Kent and George Estabrook.
Christopher Meacham has put the latest version of  CLINCH  (6.2),  with  Kent's
permission, as a self-extracting DOS archive on Jim Beach's TAXACOM fileserver,
huh.harvard.edu,  for  anonymous   FTP.    The   self-extracting   archive   is
"CLINCH62.EXE"  in  directory  /pub/software/clinch.  This should be FTPed as a
binary file.  CLINCH62.EXE is about 150 kb.  When you run it, it will expand to
14  files  requiring  about  280  kb.   The  executable  program is CLINCH.EXE.
Readme, documentation, sample input and output, and  FORTRAN  source  code  are
included.   PC-CLINCH is probably the most sophisticated compatibility analysis
program.  The Taxacom server, by the way, also has other  material  related  to
botanical systematics, including flora information.
     16. Christopher Meacham (Department of Integrative Biology, University  of
California,  Berkeley,  California  94720,  U.S.A.)  produces COMPROB, a Pascal
program to compute probabilities that characters would be compatible at random,
thus  telling  us  which  clique  is  "most  surprising".   It is available for
anonymous ftp as a PCDOS executable from the Taxacom  server  (huh.harvard.edu)
in directory pub/mip.
     17.  The program MARKOV computes  a  distance  measure  between  pairs  of
nucleotide sequences.  It also constructs phylogenies from these and summarizes
the 4x4 substitution matrices between the pairs of species.   It  uses  a  more
general  model of substitution than used in PHYLIP, the Stationary Markov Model
described in the paper by Saccone et. al. in Methods in Enzymology volume  183,
pages 570-583, 1990.  Bootstrapping is used to analyze the statistical error of
the results.  Output files from CLUSTAL and  PILEUP,  as  well  as  some  other
formats,  can  be used for input, and analysis can be confined to certain codon
positions in coding sequences.  The program is written in FORTRAN and  runs  on
VMS  systems.   It  was  produced  by Dr. Graziano Pesole and Professor Cecilia
Saccone at the University of Bari, Italy, and is available (for free?) from Dr.
Cecilia  Lanave at CSMME-CNR, Dipartimento di Biochimica e Biologia Molecolare,
Universita` di Bari, via Orabona 4, 70126 Bari, Italy.   Her  phone  number  is
39-80-243305,  her  fax  number  is  39-80-243317,  and  her  e-mail address is
lanave at vaxba0.ba.it or mvx36 at ibacsata.it
     18. J. S.  Farris  and  Mary  Mickevich  earlier  released  a  package  of
phylogeny programs, PHYSYS, which, at about $5,000, was extremely expensive (in
my opinion, which is certainly a biased one).  I  am  not  sure  whether,  from
whom, or under what conditions it is still available.
     19.  Fujitsu Ltd. ("a $21 billion  global  leader  in  advanced  computer,
telecommunications,  and  electronic devices") sells for $28,000 US a Fujitsu S
family  workstation  complete  with  a   program,   SINCAIDEN,   which   allows
"experimental  researchers,  even  those  unfamiliar  with  such analyses, [to]
easily create phylogenetic trees in their own laboratories."  The program  also
allows  searches  of the major nucleic acid sequence and protein databases (the
ad I saw does not make it clear whether these databases are provided  with  the
workstation).   The  methods  available  are  UPGMA, neighbor-joining, Farris's
(Distance Wagner)  and  the  modified  Farris  distance  matrix  methods.   The
workstation  is  SPARC  compatible  and  runs SunOS.  The SYNCAIDEN program was
developed by the group at the National Institute of Genetics, Japan  under  Dr.
Takashi  Gojobori.   Fujitsu  Ltd. may be contacted at 21-8, Nishi-Shinbashi 3-
chome, Minato-ku, Tokyo 105, Japan (phone 81-3-3437-5111 ext. 2831,  fax  81-3-
5472-4354),  or  in  the  U.S. at Fujitsu America Inc., 3055 Orchard Drive, San
Jose, California 95134-2017 (phone 1-408-432-1300  ext.  5168,  fax  1-408-434-
     20.  MUST, a package of sequence management programs, is distributed on  a
shareware  basis  by  Herve  Phillippe, Laboratoire de Biologie Cellulaire (URA
CNRS 1134 D), Batiment 444, Universite de Paris-Sud, 91405 Orsay cedex, France.
His  e-mail  address  is:  adoutte at frciti51  on Bitnet/EARN.  His phone and fax
numbers  are  respectively  and   MUST  is
available  on  a  shareware  basis  ($100  registration  fee if you do not send
diskettes) and runs on PCDOS systems using PCDOS version 3  or  later.   It  is
intended  as complementary to existing phylogeny and alignment programs and can
produce output files in the formats of PHYLIP, PAUP, Hennig86, and CLUSTAL.  It
contains a variety of sequence input, editing, checking, and storage functions,
as well as a sequence editor and a phylogeny plotter.  It also  allows  further
analyses of the results from these phylogeny programs.
     21.  Steve Smith, formerly of the Harvard Genome Laboratory,  has  written
an  X-Windows interactive sequence editor, GDE (Genetic Data Environment) which
allows the user to edit sequences and align them by hand, and to select subsets
of  sites  and  sequences  and  call  a  variety of analysis proprams including
ClustalV and many of the PHYLIP 3.4 programs.  The GDE 2.0 system will  run  on
many  workstations  that  have  the  X  windowing system.  It also includes the
TreeTool tree-plotting program (see below).  GDE 2.0 is free and  is  available
for  anonymous  ftp  transfer  at  either  at  golgi.harvard.edu  in  directory
pub/GDE2.0 and also at ftp.bio.indiana.edu in directory molbio/unix/GDE.
     22.  Mike Maciukenas, at the Department of Microbiology of the  University
of  Illinois, has written a wonderful X-windows based interactive tree-plotting
program called TreeTool.  It takes as input a PHYLIP  tree  file,  with  branch
lengths  if  they  are provided, displays the tree in either rooted or unrooted
form on any X-windows screen, and allows the user to modify  the  form  of  the
tree and the placement of nodes and labels.  When the tree is in final form the
user can have it written to a Postscript file and/or printed to  a  Postscript-
compatible  printer.   TreeTool  is  free  as  a C program for X windows and is
available   for   anonymous   ftp   from   ftp.bio.indiana.edu   in   directory
molbio/unix/GDE.   It  is  also  included  in  the  GDE  2.0  sequence analysis
environment mentioned above.
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 --> Internet:         joe at genetics.washington.edu     (IP No.
     Bitnet/EARN:      felsenst at uwavm
Good luck

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net