Kyte-Doolittle hydropathicity "motif" analysis

Jonathan Epstein Jonathan_Epstein at
Tue Sep 1 11:17:53 EST 1998

One of the scientists whom I support wishes to search SwissProt using a
non-traditional form of motif searching.  Instead of wishing to find
proteins for which the amino acids in the database sequence are
with respect to a query sequence or query motif, this method calls for
conservation of what I will call "window scores", which are computed as
follows, for a window size N (e.g., N=10).

For a given database sequence which is M residues long, compute (M-N+1)
scores for each window of size N.  Each score is the summation of
Kyte-Doolittle hydropathicity scores over than window.  Ref:

My colleague writes:

  ... the amino acid sequences are completely non-conserved  even within
known functional families.  Despite the lack of conservation at the
primary amino acid level in this class of proteins, when the
Kyte-Doolittle plots for a functional family transporters are aligned
visually- the conservation of patterns is clear as is the position of
one or two residues relative to the hydrophobicity of the plot. 
Therefore, the basis of functional families is largely the pattern in
hydrophobic residues.

  In the typical case, the user will have a novel highly hydrophobic
protein with no known relatives.  

  The most complete method would be to take Kyte-Doolittle scores window
by window sequence by sequence of the entire SwissProt or Genpept
database and then compare these scores to the Kyte-Doolittle scores for
the query sequence.

My question is: does anyone is any way to achieve this while
piggybacking off of existing tools?  E.g., BLAST certainly has a notion
of a window, and one could modify the scoring matrix to closely simulate
the Kyte-Doolittle scores, but fundamentally BLAST is still trying to
compare sequences, not graphs.  FASTA, about which I am less
knowledgable, includes programs (GREASE, TGREASE) for displaying these
Kyte-Doolittle graphs, but as far as I can tell doesn't provide a
mechanism for achieving the stated goals.

BTW, there are copies of the various sequence databases on the hard disk
of a local SGI machine.

Thanks in advance for any advice,

- Jonathan

Jonathan Epstein                                
Jonathan_Epstein at
Unit on Biologic Computation                    (301)402-4563
Office of the Scientific Director               Bldg 31, Room 2A46
Nat. Institute of Child Health and Development  31 Center Drive
National Institutes of Health                   20892

More information about the Bio-soft mailing list