how can I construct a subset of all the sequences?

Rapu Lak rapulak at macc.wisc.edu
Sun Mar 13 16:10:36 EST 1994

This is the problem:

We are interested in various questions about RNA.  Many of these
interests are about eukaryotic mRNA and specifically, the 3' ends,
3' UTRs, 3'-most exons and the 3'-most introns.  I would like to
construct a database that contains these data.  But I know that I
don't know how to do this.  

I imagine it is possible to search the entire DNA sequence database
for all eukaryotic DNA sequences, and then search for ones that
identify a translation STOP codon (from some annotation??), and
then limit the search to those that also identify the site of the
3' most intron-exon boundary.  Finally, from among these files
identify those that also have a comment (annotation?) about the
site of  cleavage and polyadenylation.  Then we would like to
extract only the DNA sequence that corresponds to the last exon as
well as the exon containing the translation stop codon (if they are
different) and the sequence of the 3'UTR up to and including the
site of cleavage and polyadenylation.  So we want to identify the
DNA sequence files that have this information and then create a new
file that contains only the DNA sequence corresponding to the
specific part of the mRNA we are interested in.  I then imagine
that we would have a large group of new files, each containing
annotations to the site of the translation stop codon, maybe the
translation reading frame, the site of cleavage and
polyadenylation,  .... (what else?).  These files would define the
database that we are interested in analyzing. 

Am I dreaming the impossible dream?  I suppose I can ask at least
one relevant question.  Are there tools or some established
programs for doing this kind of database search and database
creation?  I'd like to hear from people who have used such
programs, from people who have written such programs, and from
people who might be interested in trying such a program to actually
create such a database for us.  I would like to make contact with
those of the "net" who are comfortable doing these kinds of
manipulations and could possibly guide me to success.  Also, what
and where are the relevant FAQs to this subject/problem?  Have I
posted this question to the appropriate newgroup?

Please reply to me directly by email and I will post a SUMMARY of
what comes my way.

Rock Pulak
rapulak at macc.wisc.edu

More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net