How to set window size of DNA

Eric E. Snyder eesnyder at CERF.NET
Mon Jun 26 20:38:39 EST 1995


robison at lipid.harvard.edu (Keith Robison) wrote:
>Setting the window size is always a bugaboo. 
>Some additional options include:
>	2) Computationally exorbitant, but possibly elegant,
>	   is to look at _every_ window size.  I.e., draw
>	   a 3D plot of value vs. position vs. window size.

This is essentially what I have done with the program GeneParser
which identifies protein coding regions in genomic DNA.  Because
I wanted to do an exhaustive search for the optimum parsing of 
a gene into exons and introns, the classification statistics
(hexamer stats, local complexity, etc) need to be scored over
the length of all possible intervals (ie you need to look at
all possible window sizes).  Having done all that, the program
plots the scores in a 3D plot like Keith suggests.  I have a 
few examples of such plots on the net at:

http://beagle.colorado.edu/~eesnyder/GeneParser.html

Matrix plots are quite a good way to get a feel for various 
properties of a sequence that can't be done by simple strip
plots.  

Just to finish the story (and give a shameless plug for the 
program), for each interval in the sequence, the stats mentioned 
above can be weighted by neural net to give an overall likelihood 
that the interval belongs to a particular class (intron or exon).
Because all (N^2)/2 intervals have been scored for membership to
each class, the globally optimum parsing of the sequence can be
calculated by a very simple dynamic programming algorithm.


Eric E. Snyder
Sequana Therapeutics, Inc.
11099 North Torrey Pines Road, Suite 160
La Jolla, CA 92037
(619) 452-6550 ex 279; (619) 452-6653 fax; eesnyder at sequana.com



More information about the Comp-bio mailing list