Gene identification troubles (bit long)

tammy english tenglish at
Mon Mar 2 12:01:11 EST 1998

I have several questions about comparative genetics that I hope
someone can help shed light on, or at least point me to sources that
can.  Any help/advice appreciated and feel free to respond via e-mail
to: tenglish at

We work on a group of weird and wonderful animals that survive
freezing temperatures, anoxia/hypoxia, estivation etc.. for extended
periods of time.  In an effort to determine how they respond to these
stresses at a genetic level, our experiments have taken the following

1)  cDNA library made from mRNA isolated from tissues of
stressed (frozen, anoxic etc_) animals
2)  screen library with probe made from stress condition mRNA,
then screen with probe made from mRNA of control (no experimental
treatment) animals.
3)  Isolate clones present in stress, but not control, condition
4)  Confirm the gene to be up-regulated (northern blot),
sequence, and try to identify it (primarily using genbank BLAST

The problems: 
1) roughly 50% of the clones show NO homology with anything in genbank.  
2) about 10-25% of those clones that do elicit BLAST hits are
listed as ribosomal or mitochondrial DNA. In many cases I believe
these high-probability hits (for 16s and 18s RNA) are false, primarily
because the gene for 16s RNA from L. littorea (the species I work
with) has already been sequenced and is already in genbank.  In such a
case, the match should be perfect, not just "good". 

Our questions: 
1)  Why do so many genes show homology with ribosomal RNA? 
2)  What do we do with a gene that has been sequenced from start
to finish (UTR, start codon, stop codon, poly A signal and poly A
tail) but still shows no homology with anything in GenBank???  What
further experiments can be done?
3)  Are there other databases that are more useful to those of
us working with invertebrates?  
4)  Do inverts have variations in their codon language that I am
not aware of (e.g. different stop codons or a tendency to "wobble" the
AA/codon pair?
5)  We have tried translating the gene sequence into amino acids
then searching the "protein" sequence for motifs (via a database).
How reliable are the results?  How much useful information can be
obtained this way?

thanks for any help.

More information about the Methods mailing list