Nucleotide sequence simulation

Anders Gorm Pedersen gorm at cbs.dtu.dk
Wed Jul 4 02:19:22 EST 2001


Patrick Brunner wrote:

> I am looking for a Win or Mac program that will simulate/generate
> nucleotide= e sequences. The program should create new data sets starting
> from given al= igned sequences according to current evolutionary models of
> nucleotide subs= titutions (e.g. K2P, HKY etc.). Thus, nucleotide
> frequencies, TS/TV ratios = or gamma/codon rate heterogeneity and other
> parameters should be automatica= lly determined from the originally given
> data set or manually changeable.= =20

I think you night find Hidden Markov Models (HMMs) to be useful for this 
kind of thing. Briefly this type of model can be estimated ("trained") on a 
set of aligned sequences, and then used in "generative mode" to produce 
sequences having the same characteristics as the aligned set. I've included 
a few pointers below. The HMM packages HMMER (pronounced "hammer") and SAM 
are free for academic users but only run on UNIXes. HMMpro is also free for 
academic users and runs on some unix flavors and also on Windows NT. 

A good general introduction is: 

A. Krogh 1998. 
An Introduction to Hidden Markov Models for Biological Sequences, 
In S. L. Salzberg et al., eds., Computational Methods in Molecular Biology, 
45-63.  Elsevier. 

I'll forward this chapter to you in PDF.

Hope this helps!
Best regards,
Anders Gorm

 http://www.netid.com/html/hmmpro.html
 http://www.cbs.dtu.dk/krogh/refs.html
 http://www.cse.ucsc.edu/research/compbio/ismb99.tutorial.html
 http://www.molbiol.ox.ac.uk/documentation/hmmer-html/main-expanded.html
 http://www.cse.ucsc.edu/research/compbio/sam.html

-- 
Anders Gorm Pedersen, Ph.D.  
Center for Biological Sequence Analysis, www.cbs.dtu.dk
Technical University of Denmark





More information about the Mol-evol mailing list