Qestion: Random Sequence generation

Peter Sibbald sibbald at qucis.queensu.ca
Thu Dec 21 15:10:07 EST 1995


re: random DNA or protein sequences.

The word "random" is a little vague. You can try any of the following
depending on what questions you are asking:

1. generate sequences with probabilistically the same character
   frequencies as some real sequence. Seldom will the frequency
   in the generated sequence be exactly the same as in the real
   sequence.

2. "shuffle" an existing sequence so that the order changes but
   the character frequencies remain the same. a lot of programs
   do this kind of thing, GCG for example, if memory serves.

3. generate sequences with same single character frequencies as
   a real sequence AND the same adjacency frequencies (doublet
   frequencies). For example, the pair "qz" is rare in english,
   perhaps i generate it in a string with probability 0. This
   is just a Markov chain with a memory and of course the memory
   can vary (i.e. you can use triplets, 4-plets etc.).

4. generate all characters with equal likelihood.

etc. take your pick.

peter sibbald, sibbald at qucis.queensu.ca





More information about the Bio-soft mailing list