In article <2050 at> toms at (Tom Schneider) writes:
>I think that Mark is exactly correct, and you have missed the point.  Having a
>huge database full of human sequences opens vistas for those of us who know how
>to use statistical tools to analyse sequences.  There are many things that can
>be done.  Some of them include learning how to identify genes from raw

Ok, I agree that it is possible to use statistical methods to infer that
a given sequence contains a "gene". If I read your perspective correctly
(and ignoring the self-back patting) the main goal is to beef up the
database so that we can find new genes, whether functional or not.  I 
I'm sorry, but I just see that as cost effective,  given that we won't
have the slightest inkling of what most of these genes are supposed to

>A straight sequencing of the genome will avoid the terrible biases that we
>currently have in the GenBank database.  For example, the database is missing

Oh really? Wouldn't you say that concentrating on coli, fly, worm, yeast
human and maybe a plant species puts a bit of bias into the database?

>the insides of introns.  If you think that these are not important, then you
>may well be in for some super surprises later.  The phrase "junk DNA" is a
>statement of ignorance, not scientific fact.  People currently chop off the
>bases near the 3' sides of introns and don't report them in the database.  The
>proof is that they often end 10, 20 or 30 bases from the splice junction.  This
>would not happen if people reported all their data.  Unfortunately, this means
>that people have thrown out important parts of splice junctions BECAUSE THEY
>THOUGHT THEY WERE UN-IMPORTANT.  Do you follow?  People think something is not
>important, so they don't report it in the database, or limit the reports, so
>nobody discovers that it IS important!  Another example is the reporting of

(Nothing deleted because I am in complete agreement. Oh how I have ranted
and raved about missing intron sequences.)  But
frankly, I don't follow if this is part of the defense of the genome project.
Sure it'd be great to have chromosome long tracts of sequences to infer
gemone organization but will we really be able to make sense out of
it all using the sequence data alone? Take the case of upstream control
regions, their significance was worked for the most part by experimental
techinques.  Those results are the stuff that are used to generate rules
for sequence analysis. Not the other way around. 

>The second major justification is the enormous boost to sequencing technology
>that the project is making.  We are eventually going to be able to sequence
>everybody's DNA in a few minutes.  This will have enormous medical implications

Ok, that's a valid argument. There's nothing like technological advancement.

