Long-range genomic patterns

Chad Price price at helios.unl.edu
Thu Apr 1 12:25:14 EST 1993


ecec at midway.uchicago.edu (Eric Cabot) writes:

>I saw the Chaos game representation in either Science or Scientific
>American and immediately rushed to the keyboard to adapt a mouse-driven 
>version of the Chaos game into one that could read DNA sequences.
>And it looked terrible! So then I tried various modifications including
>comparisons of several sequences. Nothing gave a pattern much different
>than the ones I got using randomly generated sequences. Rather than
>give up and get back to work, like a sensible person, I pursued the
>Chaos game representation to what I considered the ulitimate degree.
>Instead of using the entire screen I used a small region for the coordinate
>system. Then that was shifted to the right for each new residue. Rather
>than just draw the points, a line was extended between each point and
>its successor. The representation of multiple (in this case aligned)
>sequences was accomplished by drawing in each in a different color.

>And the result, you guessed it, it looked like randomness.


I've applied complexity theory to some (10-15) DNA sequences and looked at the
results compared to what comes out of some simplistic random number generators
(taken from Press et al, Numerical Recipes). 

Suprises galore: the "random" sequences were more "regular" than the DNA
sequences, and in fact it appeared that the codon areas may be the least
regular.  This is currently uncertain because I haven't gone back to it with
well documented sequences (ie where the locations of the introns, exons, and
codons are completely documented); but I think it fairly likely.

The quality of the random number generator should have a great deal to do with
which is "more random". The ones on a typical Unix box are useless: when bit
shifted and looking at the last bit (least significant) there is a 101010
pattern - ie very non-random, even though the integer interpretation of the
value before bit-shifting has a very long repeat period.


--
chad
price at helios.unl.edu
cprice at molecular.unmc.edu



More information about the Comp-bio mailing list