Gene identification troubles (bit long)

Vladimir Svetlov svetlov at
Wed Mar 4 12:54:45 EST 1998

In article <6deokn$ck5$1 at>, tammy english
<tenglish at> wrote:

> Our questions: 
> 1)  Why do so many genes show homology with ribosomal RNA? 

Because there are tri-pentanucleotide repeats in genomic DNAs and ribosomal
RNA genes are one of the biggest. BLAST gives you the overall sequence
"convergence" not normalized to the size.

> 2)  What do we do with a gene that has been sequenced from start
> to finish (UTR, start codon, stop codon, poly A signal and poly A
> tail) but still shows no homology with anything in GenBank???  What
> further experiments can be done?

First of all do the conceptual translation and search protein databases,
including the specialized ones - S. cerevisiae, A. thaliana etc. For
example, yesterday a student in our lab could not find a sigma-like factor
in thaliana using blast on the entire swissprot with sigma-70 within first
200 hits. Repeated on thaliana database the factor was on the top.
Depending on the evolutionary relations of your subject you may fish out
something. As far as experiments go - knock it out or shove into yeast and
see what happens (a very popular approach with people doing mammalian
genome projects).

> 3)  Are there other databases that are more useful to those of
> us working with invertebrates? 

Try different ones and with proteins rather than with DNA.
> 4)  Do inverts have variations in their codon language that I am
> not aware of (e.g. different stop codons or a tendency to "wobble" the
> AA/codon pair?

Not unless you are picking mitochondrial RNAs.

> 5)  We have tried translating the gene sequence into amino acids
> then searching the "protein" sequence for motifs (via a database).
> How reliable are the results?  How much useful information can be
> obtained this way?

Results are always reliable, what you should worry about is your
interpretation thereof. Prosite and Blocks will most likely find something
that matches their criteria of a motif, it's up to you to check the
spacing, the overall organization of the protein, do a parsimony tree on
other, bona fide members of this class + your protein and a few random
picks. There is a number of cases when a very good motif turns out to be
dispensable or outright irrelevant to the function of the protein, in other
words the functional analysis would have to be done anyway.


> thanks for any help.

Vladimir Svetlov
McArdle Lab for Cancer Research
Dept. Oncology
1400 University Ave.
Madison, WI 53706

More information about the Methods mailing list