Prediction the function of Novel gene(s)!

Malay curiouser at
Fri Jun 15 06:42:24 EST 2001

Functional prediction is an art. I'll suggest you to avoid Prosite. It will
bias your opinion. The most important tools are PSI-BLAST and the homology
modelling. And the most important database is BLOCKS. There is no strict
protocols and you need to be very lucky to get ant meaningful information.
Briefly this is what the successful people did-

#1. Search Genbank with PSI-BLAST with cutoff 0.01 until the search mearges.
#2. Align all the sequences from the merging by CLUSTAL.
#3. Take the most conserved sequence region and search against BLOCKS
database and see whether you can pick up any conserverd block in your
#4. You can even remove coiled-coli structure from your sequence and try
homology modelling with the sequence hits from the blocks.
#5. Trying different combination of the previous steps.
#. Cross your finger whenever you submit your sequence in PSI-BLAST :-)

All the best.


Malay Kumar Basu
Centre for Cellular and Molecular Biology
Hyderabad 500007

Fax: (00-91)40-7171195
Phone: (00-91)40-7172241
Peace through superior firepower.
curiouser at

----- Original Message -----
From: "Frank O. Fackelmayer" <Frank at>
To: <methods at>
Sent: Friday, June 15, 2001 4:03 PM
Subject: Re: Prediction the function of Novel gene(s)!

> "R. Jayakumar" wrote:
> >
> > hi..
> >    partial sequences will do fine, to find what kind of gene they are.
> > you should do is this - First of all make sure there is no sequencing
> > there.  The check out the ORFs in all possible frames (6 possibilities)
> > the sequence is an internal part of a gene, then you should get an open
> > with no start or end.  But take care to see whether there are any
> > errors like insertion or deletion of a bp which can cause a frameshift.
> > Take the open frame, translate the protein for that frame and use a
> > blastp or a blastx (if you want to use the DNA sequence) and that should
> > pick out the gene from the database.  Sometimes, i submit the sequence
> > such and do a FASTA (at with it for identifying the
> > But this is not adivsable because of codon degeneracy and ATGC bias
> > So it is always advisable to translate it into a protein sequence and
> > to do the BLASTing.
> >     I normally use the FRAMES tool in GCG for the ORF checking.  But
> > are other softwares at website called GENEMARK (but not
> > satisfactory) for doing this.  You can also search for ESTs within your
> > sequences to check out whether they are significant.   You can also do a
> > motif search in the protein sequence.. you should try PREDICTPROTEIN or
> > PROTPARAM with the translated protein.. that should help a lot.
> >    best of luck
> > jayakumar
> That will, of course, only work for known genes, and I guess the
> original poster already did it. Otherwise he wouldn´t be able to say it
> is a novel gene...
> As to defining the function of a really novel gene, the first approach
> would be to do a prosite search for funcional domains. Note that not all
> hits you´ll get are meaningful, and for defining the function only the
> hits to functional sequences may be considered (not those for
> modifications, even though these MIGHT be helpful at a later step)! With
> luck you find reasonable homology to a known domain, e.g. a catalytic
> domain of an enzyme. When you don´t - and that is not uncommon for a
> really novel gene from an organism with no or limited genomic
> information - you will have to resort to benchwork.
> A good approach is to use any of the methods to identify interaction
> partners of your new protein, e.g. immunoprecipitation (and
> identification of co-precipitated proteins by e.g. mass spectrometry) or
> two hybrid experiments. With luck, your protein interacts with a known
> protein, and you´ll have a first hint as to the cellular pathway your
> protein might be involved in. Without luck, you still don´t know
> anything after a year of hard work...
> Frank


More information about the Methods mailing list