Qestion: Random Sequence generation
Richard Hughey
rph at cse.ucsc.edu
Thu Dec 21 16:46:35 EST 1995
More complex random sequences can be produces from a linear hidden
Markov model (HMM) of a family of sequences. One advantage of this
approach is that the probabilities of insertions and deletions are
modeled as well as the character probabilities for each column. If
you have a group of sequence similar to the random ones you desire,
you can train an HMM for the familiy, and then generate the sequences.
Both Sean Eddy's HMMER (http://genome.wustl.edu/eddy/hmm.html) and our
SAM system (http://www.cse.ucsc.edu/research/compbio/sam.html) have
programs for doing this. SAM has a WWW server for training HMMs and
performing multiple alignments and distance scoring based on the
model, but at the moment to do typical sequence generation, you'll
need to get a copy of the source code by sending email to
sam-info at cse.ucsc.edu.
Richard
In article <4bceuv$4u4 at knot.queensu.ca>, sibbald at qucis.queensu.ca (Peter Sibbald) writes:
|>
|> re: random DNA or protein sequences.
|>
|> The word "random" is a little vague. You can try any of the following
|> depending on what questions you are asking:
|>
|> 1. generate sequences with probabilistically the same character
|> frequencies as some real sequence. Seldom will the frequency
|> in the generated sequence be exactly the same as in the real
|> sequence.
|>
|> 2. "shuffle" an existing sequence so that the order changes but
|> the character frequencies remain the same. a lot of programs
|> do this kind of thing, GCG for example, if memory serves.
|>
|> 3. generate sequences with same single character frequencies as
|> a real sequence AND the same adjacency frequencies (doublet
|> frequencies). For example, the pair "qz" is rare in english,
|> perhaps i generate it in a string with probability 0. This
|> is just a Markov chain with a memory and of course the memory
|> can vary (i.e. you can use triplets, 4-plets etc.).
|>
|> 4. generate all characters with equal likelihood.
|>
|> etc. take your pick.
|>
|> peter sibbald, sibbald at qucis.queensu.ca
|>
More information about the Bio-soft
mailing list