Hidden Markov Modeling Software Available

Richard Hughey rph at cse.ucsc.edu
Mon Feb 27 18:37:40 EST 1995


The Sequence Alignment and Modeling system (SAM) is a collection of
flexible software tools for creating, refining, and using linear
hidden Markov models for biological sequence analysis.  The model
states can be viewed as representing the sequence of columns in a
multiple sequence alignment, with provisions for arbitrary
position-dependent insertions and deletions in each sequence.  The
models are trained on a family of protein or nucleic acid sequences
using an expectation-maximization algorithm and a variety of
algorithmic heuristics.  A trained model can then be used to both
generate multiple alignments and search databases for new members of
the family.  SAM is written in the C programming language for Unix
machines and MasPar parallel computers, and includes extensive
documentation.

The algorithms and methods used by SAM have been described in several
pioneering papers from the University of California, Santa Cruz.
These papers (citations below), as well as the SAM software suite, are
available via anonymous ftp to ftp.cse.ucsc.edu in the pub/protein
directory, or via the World-Wide Web to
http://www.cse.ucsc.edu/research/compbio/sam.html.

The software is freely available for non-commercial research use,
however you will need an encryption key to decrypt the
pub/protein/sam1.0.tar.Z.crypt file available from the ftp server or
the WWW page.  Please send email to sam-info at cse.ucsc.edu to receive
the key or to make other arrangements if you do not have the crypt
utility.  The unencrypted documentation (UCSC Technical Report
UCSC-CRL-95-7) is in pub/protein/sam1.0_doc.ps.Z.

Although we plan to create an email or WWW server in the future, one
is currently not available.  If you wish to use SAM, you must grab the
code and compile it yourself, a process we have tried to make as
painless as possible.

Richard Hughey
Anders Krogh

sam-info at cse.ucsc.edu
http://www.cse.ucsc.edu/research/compbio/sam.html
-----------------------------------
-----------------------------------
Related papers:

A. Krogh, M. Brown, I. S. Mian, K. Sj\"olander, and D. Haussler.
 Hidden Markov models in computational biology: Applications to
 protein modeling.
 Journal of Molecular Biology, 235:1501--1531, February 1994.

R. Hughey and A. Krogh,
 SAM: Sequence alignment and modeling software system.
 Technical Report UCSC-CRL-95-7, University of California,
 Santa Cruz, CA, January 1995. 

M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sj\"olander, and D. Haussler.
 Using Dirichlet mixture priors to derive hidden Markov models 
   for protein families.
 In L. Hunter, D. Searls, and J. Shavlik, editors,  Proc. of First
 Int. Conf. on Intelligent Systems for Molecular Biology, pages 47--55, Menlo
 Park, CA, July 1993. AAAI/MIT Press.

D. Haussler, A. Krogh, I. S. Mian, and K. Sj\"olander.
 Protein modeling using hidden Markov models: Analysis of globins.
 In  Proceedings of the Hawaii International Conference on System
  Sciences, volume 1, pages 792--802, Los Alamitos, CA, 1993. IEEE Computer
  Society Press.

R. Hughey.
 Massively parallel biosequence analysis.
 Technical Report UCSC-CRL-93-14, University of California, Santa
  Cruz, CA, April 1993.

A. Krogh, I. S. Mian, and D. Haussler.
 A hidden Markov model that finds genes in  E. coli DNA.
  Nucleic Acids Research, 1994.
 in press.



More information about the Bionews mailing list