exons & protein domains

killer yeast vvsvetlov at utmem1.utmem.edu
Wed Jul 17 20:17:50 EST 1996


In article <4sdo1g$dtr at dartvax.dartmouth.edu>, bob.gross at dartmouth.edu
(Bob Gross) wrote:


> Of course we are aware of databases like the Prosite database that
contain many
> motifs, but these motifs are usually quite short and probably do not represent
> whole functional domains on proteins. Rather, they often represent short
> targets, e.g. glycosylation sites, phosphorylation sites, etc. However, there
> are some true "domains" such as ATP binding, G-protein GTP binding, DNA
> binding, etc. My question to this group is what "domains" would you start off
> with in testing the grouping algorithm - based on your biological knowledge?

As you already noticed there are motifs and motifs, there are also people
who call in press 4 aminoacids a "domain" (not that I'm pointing a finger
or smth. <G>. Quite many "motifs" resulted from a straighforward alignment
of homologous genes/proteins and reflect on common ancestry as much as on
functionality of the sequence itself. Due to this method of elucidation
such motifs are not directly charged with any function (we know examples
of perfect motifs like Zn-Cys6 binuclear clusters - a bona fide
DNA-binding motifs that are dispensable for function) or
structural/folding autonomy - criterion often applied (implied) when
people speak of domains. Personally I share in opinion that a true domain
is kinda self-sufficient structural and functional element of a protein,
that carries out a function by itself and folds more or less independently
in native or near-native conformation. Domains can usually be grafted in
another protein retaining their functionality - e.g. activation domain of
Gal4p fused to almost anything in yeast renders the fusion a
transcriptional activator that is still galactose inducible (meaning
Gal80p repressible). Likewise GATA-binding Zn finger "motif" present in
most eukaryotes is not limited to two Cys2 pairs and segment between them
- mapping and NMR indicated that at least 20 aa C-ward of the them
("extended loop") are required for DNA/protein binding and regulation
thereof. 
This is a long way to say that I would look for domains that were mapped
(say by deletion analysis) as to essential for some function - by the
virtue of this particular set-up not only aa directly involved in making
contacts or catalysis are picked up but also those that critically affect
folding and presentation of the functional domain/motif. Unfortunately
such work is not yet made part of Prosite or other protein databases I
know of (like YPD although I had their admission that this sort of
information is of definite value) and people as a rule do not update their
Genbank or Wussprot entries after they've done and published mapping some
years after the initial cloning and sequencing was reported. For me this
was the major reason to do a compilation of yeast transcriptional factors
structural/functional motifs (appeared last year in Yeast) - many of those
domains can not be picked up based on the sequence similarities (like
acidic activators) and unlike all before us we did supplied actual
positions in the protein sequence (so that people know where as well as
what has been found). One drawback of this approach is that one need to
actualy go through all them papers rather than cutting and pasting from
abstracts... even bigger <G>. I hope that in other fields you can find
something like this already done in similar form - if not I'd go get some
papers on mapping and even better grafting of functional domains of
whatever proteins you are interested.
Hope that helps.
Regards,
Vladimir Svetlov

-- 
Of what use is a philosopher who does not hurt anybody's feelings?
                                Diogenes



More information about the Mol-evol mailing list