Nucleotide sequence simulation
Anders Gorm Pedersen
gorm at cbs.dtu.dk
Wed Jul 4 02:19:22 EST 2001
Patrick Brunner wrote:
> I am looking for a Win or Mac program that will simulate/generate
> nucleotide= e sequences. The program should create new data sets starting
> from given al= igned sequences according to current evolutionary models of
> nucleotide subs= titutions (e.g. K2P, HKY etc.). Thus, nucleotide
> frequencies, TS/TV ratios = or gamma/codon rate heterogeneity and other
> parameters should be automatica= lly determined from the originally given
> data set or manually changeable.= =20
I think you night find Hidden Markov Models (HMMs) to be useful for this
kind of thing. Briefly this type of model can be estimated ("trained") on a
set of aligned sequences, and then used in "generative mode" to produce
sequences having the same characteristics as the aligned set. I've included
a few pointers below. The HMM packages HMMER (pronounced "hammer") and SAM
are free for academic users but only run on UNIXes. HMMpro is also free for
academic users and runs on some unix flavors and also on Windows NT.
A good general introduction is:
A. Krogh 1998.
An Introduction to Hidden Markov Models for Biological Sequences,
In S. L. Salzberg et al., eds., Computational Methods in Molecular Biology,
I'll forward this chapter to you in PDF.
Hope this helps!
Anders Gorm Pedersen, Ph.D.
Center for Biological Sequence Analysis, www.cbs.dtu.dk
Technical University of Denmark
More information about the Mol-evol