Transcription factor software

Wed May 20 08:14:49 EST 1992

In <9205192318.AA16647 at> danj at WELCHGATE.WELCH.JHU.EDU writes:

> Mauricio M. Bustos writes:
> >I am looking for a program to check new peptide sequences for motifs
> >(domains?) characteristic of known transcription factors.  For instance,
> > zinc-fingers, leucine zippers, helix-loop-helix, homeodomains, POU domains
> > etc..  >It would have to run on VAX/VMS, UNIX or PC machines.  Any help at
> > all will be greatly appreciated.
> Well, there are a few things which you might find usefull.  Prosearch will
> take a protein and look for known protein motifs by comparing it to the
> Prosite database.  I believe that you can also use Prosearch to compare a
> protein to The TFD (Transcription Factor Database)) - if you use
> tfd2prosite.c to convert the TFD to Prosite format.  Of course you may also
> find the TFD alone to be of use with other comparison software.  (There is a
> Mac version of prosearch called Macpattern too.) 
> Best of luck,
> Dan Jacobson
> danj at
> { text deleted }

	For GCG sites, there are a couple of possibilities for doing analyses
using the TFD.  One comes with the package and utilizes the FindPatterns
program.  Another can be implemented for the Motifs program.  Athough Motifs
ordinarily analyzes protein analysis with Prosite, we have been able to make
it work with nucleotide sequences after the setup described below.

	The TFD is available in the file  SITEDATA.GCG  (available via anon.
ftp from  in  /repository/TFD/datasets).  Feedback from tech
support at GCG has provided info on using this with the FindPatterns, Map,
MapSort, and MapPlot programs.  However, one must either split SiteData.GCG
into subfiles which have no more than 1000 site definitions or modify the GCG
source code for these programs and then recompile so that they handle the
entire 1887 sites listed in the TFD.

	As GCG's FindPatterns is capable of searching for sequence patterns on
both strands of a NT sequence, it is attractive for locating sites recognized
by trascription factors.  However, the program doesn't provide a way to include
any documentation about a particular site in its results file, and users must
obtain it by searching SiteData.GCG for it.  While this is doable, it's not as
easy or convenient as one might like.

	As an alternative to using Find, we have found that it is possible to
use the Motifs program (part of GCG v.7) to perform the desired analyzes, and
to obtain documentation in the output file as well.  To accomplish this, it was
necessary to first create a  TFD.Patterns  file (with a format like that in
GCG's  Prosite.Patterns ) and a set of  .TFdoc  files using the information
found in  SITEDATA.GCG  (while the GCG package has a  TFsites.DAT  file, it
doesn't contain all the information found in SITEDATA.GCG).  In creating 
TFD.Patterns it is also necessary to expand any ambiguous NT symbols to
patterns containing the alternate NTs (ex,  Y  =>  (C,G) ).  This in effect
creates search patterns in a 'prosite' format which Motifs can readily use.

	Creating  TFD.Patterns  and associated  .TFdoc  files was initially
done using a DCL command procedure.  However, we now have a  C program  that
completes the conversion much more quickly (and also potentially provides a
way to implement the use of the TFD with Motifs in the Unix version of GCG
- an option which has not yet been tested).

	If anyone's interested in making Motifs analyze NT sequences for sites
recognized by transcription factors, send an email request for the C program
and notes on installing the files generated by it.


   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 / Michael J. Weise, Ph.D.    \  Univ.of Ga. BioScience Computing Facility \
(   weise at         \   Dept.of Genetics  UGa, Athens  GA  30602 )
 \ _ _ _'Tis_only_me_speak'n._ _\_ _ _ _ _ _ _ (706) 542-1409_ _ _ _ _ _ _ /

More information about the Bio-soft mailing list