Software for identifying new members of gene families

William R. Pearson wrp at alpha0.bioch.virginia.edu
Tue Nov 2 08:54:30 EST 1999


We have developed a simple computational/graphical strategy for
screening nightly updates of Genbank (actually we do the screens
weekly on the 7 nightly updates) for new members of large protein
families.

     Retief, J. D., Lynch, K. R., and Pearson, W. R. (1999)
     Panning for genes - a visual strategy for identifying novel
     gene orthologs and paralogs. Genome Res. 9:373-382.

The software is available free for academic users from
http://www.uvasoftware.org.

The strategy searches DNA databases using 20 - 60 protein query
sequences, which represent the different known branches of the protein
family, with tfastx3.  The 20 - 60 tfastx3 search results are then
scanned and rearranged and summarized graphically in a way that
greatly simplifies identification of new family members.  For an
example, see:

	 http://fasta.bioch.virginia.edu/fasta/pan_demo/gt_demo.pdf
	 http://fasta.bioch.virginia.edu/fasta/pan_demo/gtm_demo.pdf
	 http://fasta.bioch.virginia.edu/fasta/pan_demo/gstn_old.pdf
	 http://fasta.bioch.virginia.edu/fasta/pan_demo/gstn_new.pdf

If you view these pages using the acrobat plug-in to Netscape or IE,
you can click on each panel to see the underlying alignments.

I mention this because several investigators in this field were
unaware of the approach and were scanning for new gene family members
in a more cumbersome fashion.  The program is relatively easy to set
up if you are getting your data from the NCBI (e.g. est_human or
genbank nightly updates) and you have the fasta3 package of programs
from the University of Virginia.  People have had more difficulties
when using other databases, and it will not work with the GCG version
of FASTA.

Bill Pearson






More information about the Bio-soft mailing list