Summary of software for sequence alignment

Karyn Davis karyn at connect.com.au
Fri Apr 8 22:23:28 EST 1994


Here is the summary of responses to my posting on software for sequence
alignment.

--------------------------------------------------------------------------------
Gerald Hertz wrote:

I'm working on developing multiple alignment programs that can handle
gaps.  The gap penalty is determined by the statistical properties of
the alignment and is not directly supplied by the user.  The programs
are still evolving; however, a preliminary version is available by
anonymous ftp.  The version in the ftp directory is designed to
identify local alignments, not global alignments; however, the program
can probably be tricked into finding a global alignment if that is
what you need.  If you would like to use my programs, please feel free
to ask me questions because I realize my documentation may be somewhat
confusing.  I'm in the process of writing a manuscript describing the
statistics and the program.  The following is a portion of the
"readme.consensus.4" file that describes this program (lconsensus-gc)
and several others in more detail.

I have the following programs that can be accessed by anonymous ftp:

1) consensus (version 3)
2) wconsensus (version 3)
3) lconsensus-gc (version 1)
4) patser (version 2)
5) gmat-inf-gc (version 2)
6) fasta-consensus (version 1)

The "consensus" program is the current version of the program
described in Stormo and Hartzell (1989, PNAS, 86:1183-1187) and Hertz
et al. (1990, CABIOS, 6:81-92).  However, this program has many more
options than the published version.  The most major change is that
each sequence may contribute more than one word to the pattern being
generated.  The "-ol" option corresponds to the original algorithm, in
which each sequence can contribute only one word.  Also, the program
now saves the best matrices after each cycle regardless of parentage
(the number of matrices saved is determined by the "-q" option).

"wconsensus" differs from the "consensus" program in that the user
does not supply the width of the pattern being sought.  Until I write
version 4 of "consensus", these two programs will differ in the
organization of their command lines and in some of their statistics.
Version 3 of "wconsensus" differs from version 2 in permitting
terminal gaps in the alignments when the "-pg2" option is used.

"lconsensus-gc" differs from the "wconsensus" program in that the
alignments can contain insertions and deletions (i.e., indels).  The
performance of "lconsensus-gc" is still being tested.  If you use this
program, please let me know how it works.

The "patser" program allows one to score the words of a sequence
against a summary matrix obtained from the "consensus" or "wconsensus"
program.

The "gmat-inf-gc" program can do a crude graphing of the information
content at each position of an alignment obtained with the
"consensus", "wconsensus", or "lconsensus-gc" program.

The "fasta-consensus" program converts a file from the FASTA sequence
format to a sequence format that can be used by the "wconsensus" and
"lconsensus-gc" programs.  The input is from the standard input and
the output is sent to the standard output.

The source code can be copied by anonymous ftp from
"beagle.colorado.edu" (128.138.212.1).  The source for the first five
programs is located in the compressed tar files
"consensus-v3b.5.tar.Z", "wconsensus-v3a.1.tar.Z",
"lconsensus-gc-v1d.4.tar.Z", "patser-v2.3.tar.Z", and
"gmat-inf-gc-v2.1.tar.Z" (the "-v" indicates the version number).  The
source for the sixth program is located in the file "fasta-consensus-v1.c".
These files are located in the "pub/Consensus" directory.  Each tar
file contains a UNIX "makefile" that describes how to compile the
corresponding program and a copy of the corresponding directions
below.  The tar files should be unbundled in separate directories to
avoid name clashes.

The programs are written in C and were developed on a MIPS M/2000, a
DECstation 2100, a Silicon Graphics Indigo, and a SUN SPARCstation 10
using the BSD UNIX environments.  "wconsensus" and "lconsensus-gc" may
also work under the VMS operating system; however, I have not had an
opportunity to test them on a VAX.  If you discover aspects of my code
that are not compatible with your system, please let me know.

--------------------------------------------------------------------------------
Brett Lindenbach suggested  GCG's PILEUP (fortran) or ClustalV (C).
Ole Schuesseler also suggested ClustalV
--------------------------------------------------------------------------------
Brian Osbourne suggested:
>/*  A MULTIPLE ALIGNMENT PROGRAM (MAP):
>
>    copyright (c) 1992 Xiaoqiu Huang
>    The distribution of the program is granted provided no charge is
>made
>    and the copyright notice is included.
>    E-mail: huang at cs.mtu.edu
>
>    Proper attribution of the author as the source of the software would
>    be appreciated: "On global sequence alignment" (to appear in
>CABIOS).
>              Xiaoqiu Huang
>              Department of Computer Science
>              Michigan Technological University
>              Houghton, MI 49931
>
>    The MAP program computes a multiple global alignment of sequences
>using
>    iterative pairwise method. The underlying algorithm for aligning
>    two sequences computes a best overlapping alignment bewteen
>    two sequences without penalizing terminal gaps. In addition,
>    long internal gaps in short sequences are not heavily penalized.
>    So MAP is good at producing an alignment where there are long
>    terminal or internal gaps in some sequences. The MAP program is
>    designed in a space-efficient manner, so long sequences can be
>aligned. 
This available for ftp.  Use archie to find out where to get it.
--------------------------------------------------------------------------------
David Spafford suggested:
Simple alignments of say, dozen sequences <400 bp), you can use ABI's
SeqEd.
Another program I have used is SEQSEE (pronounced sexy).
The far and away the best alignment program is called pileup (up to 60
sequences with gaps) or gap (best-fit of 2 sequences) from GCG or Genetics
Computer Group, Inc.  We have the program on our local unix system.

If you fetch to standford university's biology shareware address, they have
some freebie programs that do alignments satisfactorily as well.

-------------------------------------------------------------------------------

Karyn Davis
karyn at connect.com.au



More information about the Methods mailing list