In article AA14252 at mailgate.eur.nl, themmen at endov.fgg.EUR.NL (themmen) writes:
> Hello Experts
>> We are dealing with a puzzling problem. Recently we have cloned a cDNA
> from rat testis that is hormone regulated. The cDNA encoded a predicted
> leucine rich protein of 745 amino acids. We are now sequencing the human
> homologous cDNA. It also contains an ORF, but now comes the puzzle: at
> the RNA level the percentage identity (80%) is higher than the percentage
> of identical amino acids (72%). My conviction has been that it should be
> the other way around because of the possibility of the third base in the
> codon to be at least two-fold redundant.
>> Can somebody give us an idea?
>> Axel PN Themmen
What about the G+C content in third positions of the gene? It may be extreme
in both sequences, so that percentage identity in third positions HAS to be
high. For example, in such cases, nearly all glutamate codons are GAG rather
than GAA in both sequences.
G+C content in third position in vertebrates depends on the location of the
gene on the chromosome : there are GC-rich and GC-poor regions, named
isochores. For revue, see works of Bernardi and Mouchiroud.
Further, the indices you use as measures of variability are not clearly
relevant. For one aminoacid change, I expect few more than one nucleic
change in the corresponding codon (usually one, rarely two), so that
percentage of differences in the nucleic _non synonymous_ positions
(roughly positions 1 and 2) is expected to be about half the proteic one,
as there are roughly two nucleic non-synonymous positions for each
aminoacid position. To compare proteic and nucleotidic divergences,
you cannot avoid separating synonymous and non-synonymous sites.