Your thesis of using exons to compare protein domains touches on a somewhat controversial
topic of introns old versus introns new (and conversely exons old versus exons new). Several
years ago the theory was put forth (I don't remember the author) that new proteins arise from
exon shuffling and consequently both exons and introns are ancient. However, the fact that
there are introns in genes that have been relatively recently transferred to the nucleus from
mitochondrial or proto-chloroplast genomes supports the alternate theory that some introns
(and therefore exons) are newly derived, possibly due to invasion of transposons into a gene.
There is evidence for both camps of thought, that is, that some exons/introns are old and some
new. My point is that limiting your analysis of protein domains within an exon might miss
important members of the family simply because a relatively "new" intron split up this domain.
Your idea of using the GTP binding domain is a good one (sites I, II, and III, in EF-tu,
p21RAS, SRP54 etc) and you might be able to use it to test the correlation exons with protein
domains. Of course, I am not even mentioning protein splicing or mRNA editing which would also
alter your analysis of genomic exons/protein domains.
Nothing is ever as simple as it seems.
amief at candelab.berkeley.edu