Qestion: Random Sequence generation
sibbald at qucis.queensu.ca
Thu Dec 21 15:10:07 EST 1995
re: random DNA or protein sequences.
The word "random" is a little vague. You can try any of the following
depending on what questions you are asking:
1. generate sequences with probabilistically the same character
frequencies as some real sequence. Seldom will the frequency
in the generated sequence be exactly the same as in the real
2. "shuffle" an existing sequence so that the order changes but
the character frequencies remain the same. a lot of programs
do this kind of thing, GCG for example, if memory serves.
3. generate sequences with same single character frequencies as
a real sequence AND the same adjacency frequencies (doublet
frequencies). For example, the pair "qz" is rare in english,
perhaps i generate it in a string with probability 0. This
is just a Markov chain with a memory and of course the memory
can vary (i.e. you can use triplets, 4-plets etc.).
4. generate all characters with equal likelihood.
etc. take your pick.
peter sibbald, sibbald at qucis.queensu.ca
More information about the Bio-soft