Hi, all!
I have a protein sequence and also a published consensus sequence
(for the WD repeat). The consensus has positions with conserved residues,
positions where one of a defined subset of amino acids are accepted, and
stretches with no restriction on sequence but limits to their length.
Furthermore, family members have 4-8 copies of the repeat per peptide with
each repeat varying independently from the consensus. Here is the
expression (from Nature. 1994 Sep 22; 371(6495): 297-300), where letters
are std amino acid code (except in lower case), square brackets mean any
one of the enclosed, and curly brackets give a range of residues of the
preceding type (e.g., x{0,3} means from 0 to 3 amino acids of any type).
[gsav]hxxx[livfmca]xx[livfmcas]x{1,7}[fywli]x{0,3}[pndg]x{0,2}[pndg]x{0,4}
[livfmca]{2,3}[gstacy][gstac][gstacy]xdxx[livfmca][livfmca][wfy][dnrk]
My question is, what computer program (preferably on a Mac or Win PC) can
find all the best repeats in my sequence? The ref above has citations to
some ftp-able software (Unix) which I will check with our computer people
about how to load and use, but I'd like to have something which feels a
little more accessible.
Thank you for your suggestions! If you are responding, please include an
email reply directly to me, so that I don't miss anything.
Michael Stockelman
University of Illinois at Chicago
Department of Molecular Genetics
900 S. Ashland Ave. 60607
michaels at uic.edu
312-996-6996