Qestion: Random Sequence generation

Richard Hughey rph at cse.ucsc.edu
Thu Dec 21 16:46:35 EST 1995


More complex random sequences can be produces from a linear hidden
Markov model (HMM) of a family of sequences.  One advantage of this
approach is that the probabilities of insertions and deletions are
modeled as well as the character probabilities for each column.  If
you have a group of sequence similar to the random ones you desire,
you can train an HMM for the familiy, and then generate the sequences.

Both Sean Eddy's HMMER (http://genome.wustl.edu/eddy/hmm.html) and our
SAM system (http://www.cse.ucsc.edu/research/compbio/sam.html) have
programs for doing this.  SAM has a WWW server for training HMMs and
performing multiple alignments and distance scoring based on the
model, but at the moment to do typical sequence generation, you'll
need to get a copy of the source code by sending email to
sam-info at cse.ucsc.edu.

Richard


In article <4bceuv$4u4 at knot.queensu.ca>, sibbald at qucis.queensu.ca (Peter Sibbald) writes:
|> 
|> re: random DNA or protein sequences.
|> 
|> The word "random" is a little vague. You can try any of the following
|> depending on what questions you are asking:
|> 
|> 1. generate sequences with probabilistically the same character
|>    frequencies as some real sequence. Seldom will the frequency
|>    in the generated sequence be exactly the same as in the real
|>    sequence.
|> 
|> 2. "shuffle" an existing sequence so that the order changes but
|>    the character frequencies remain the same. a lot of programs
|>    do this kind of thing, GCG for example, if memory serves.
|> 
|> 3. generate sequences with same single character frequencies as
|>    a real sequence AND the same adjacency frequencies (doublet
|>    frequencies). For example, the pair "qz" is rare in english,
|>    perhaps i generate it in a string with probability 0. This
|>    is just a Markov chain with a memory and of course the memory
|>    can vary (i.e. you can use triplets, 4-plets etc.).
|> 
|> 4. generate all characters with equal likelihood.
|> 
|> etc. take your pick.
|> 
|> peter sibbald, sibbald at qucis.queensu.ca
|> 




More information about the Bio-soft mailing list