Random Thinking on Transition Bias
Transition bias, usually measured by transition/transversion (s/v)
ratio, is deceptively simple. In this note I will try to outline
the factors that are responsible for the observed s/v raio for
protein-coding genes in nuclear DNA sequences (the conclusions are
applicable to mitochondrial DNA sequences as well).
A protein-coding sequence can be broken down into nondegenerate
sites, two-fold degenerate sites and four-fold degenerate sites.
Let us first focus on the s/v ratio at the two-fold degenerate
sites, because these sites often are the source of most of the
observed transition bias. The most obvious explanation for the high
transition bias in mtDNA at two-fold degenerate sites is that
transversional mutations at these sites are nonsynonymous and are
reduced by purifying selection, whereas transitional mutations at
these sites are synonymous and are not reduced by purifying
selection. Thus, purifying selection operates differentially
against transversional substitutions, resulting in a high
transition bias.
1. A Simple Model:
It is often helpful to represent one's argument in symbolic forms.
The number of transitional substitutions at two-fold degenerate
sites per generation is:
R_t = 2N[u(1)*P(t_i) + u(2)*P(t_2)*2 +...+ u(i)*P(t_i)*i
+...+ u(n)*P(t_n)*n] ....................................(1)
where N is effective population size, u(i) is the mutation rate per
generation of the sequence (gene) involving i transitions at
different nucleotide sites of the gene, P(t_i) is the fixation
probability of a mutation involving i transitions, and n is the
number of two-fold degenerate sites, which is the maximum number of
transitional mutations a gene can accumulate in one generation.
The number of transversional substitutions at two-fold degenerate
sites per generation is:
R_v = 2N[v(1)*P(v_i) + v(2)*P(v_2)*2 +...+ v(i)*P(v_i)*i
+...+ v(n)*P(v_n)*n] ....................................(2)
where v(i) is the mutation rate per generation of the gene
involving i transversions at different sites of the gene, P(v_i) is
the fixation probability of a mutation involving i transversions.
The s/v ratio is simply
R_t Sum (from 1 to n) u(i)*P(t_i)*i
--- = ------------------------------- ..........................(3)
R_v Sum (from 1 to n) v(i)*P(v_i)*i
This equation highlights two main determinants of transition bias.
One is mutation bias, u(i)/v(i), and the other is the fixation
probability of transitional mutations relative to that of
transversional mutations, P(t_i)/P(v-i). The s/v ratio is expected
to increase with increasing u(i)/v(i), all else being equal. This
has been referred to as the differential_mutation_pressure
hypothesis of transition bias. Similarly, the s/v ratio is expected
to increase with increasing P(t_i)/P(v_i).
If synonymous mutations are neutral, then transitional mutations at
two-fold degenerate sites (which are synonymous) will have P(t_i)
= 1/2N, and transversional mutations at these sites will have a
fixation probability equal to
1 - exp[-2*s(i)]
P(v_i) = ------------------ ....................................(4)
1 - exp[-4*N*s(i)]
where s(i) is the effect of carrying i transversional mutations
(nonsynonymous) on the fitness of the mutant. If nonsynonymous
mutations are deleterious, then s(i) is negative. This gives an s/v
ratio as:
R_t Sum (from 1 to n) u(i)*i
--- = ------------------------------- ..........................(5)
R_v Sum (from 1 to n) v(i)*P(v_i)*i
Thus, the stronger the purifying selection, the more negative s(i)
will become, which results in decreasing P(v_i) and R_v, and
consequently an increasing s/v ratio. This is what is referred to
as the purifying selection hypothesis of transition bias. We see
that the differential mutation pressure hypothesis and the
purifying selection hypothesis are two sides of the same coin.
2. Other Factors Affectign s/v Bias:
a. Selection on codon bias:
The effect of codon bias on s/v ratio can be seen easily from
equation (3). With selection maintaining codon bias, synonymous
substitutions are no longer neutral but will be equal to
1 - exp[-2*s(i)]
P(v_i) = ------------------ ....................................(6)
1 - exp[-4*N*s(i)]
where s(i) is now the effect of carrying i transitional mutations
on the fitness of the mutant. P(t_i) (and R_t) will consequently
decrease with the intensity of purifying selection for codon bias.
Thus, we should expect to see little transitional bias at two-fold
degenerate sites in genes experiencing strong purifying selection
for codon bias.
b. DNA repair:
The effect of DNA repair on s/v ratio can also be seen clearly
through equations (4-5). We note that s(i) is expected to become
more negative with increasing i. For example, the effect of having
many nonsynonymous transversions on the fitness of the mutant is
expected to be more deleterious than that of having a single
transversion. Consequently, P(v_i) in equation (7) should decrease
with increasing i. When there is DNA repair, i in equations (7-8)
will be small, so that s(i) approaches zero, and P(v_i) approaches
1/2N (the fixation probability of a new neutral mutation). As a
consequence, R_t/R_v approaches 1 when DNA repair diminishes i to
small values.
In the absence of DNA repair, i in equations (7-8) can be large, so
that R_t in equation (8) will also be large. However, when i is
large, s(i) will be very negative. Consequently, P(v_i) in equation
(7) approaches zero for large i. In other words, mutants with a
large number of transversional mutations do not become fixed and
therefore do not contribute to the substitution rate of
transversions, whereas mutants carrying a large number of
transitional (synonymous) mutations remain neutral and can
contribute to the substitution rate of transitions. Thus, s/v ratio
will increase when i becomes large in the absence of DNA repair.
Understanding the effect of DNA repair on s/v ratio helps to answer
a long-standing question concerning the difference in transition
bias between mitochondrial genes and nuclear genes. In sharp
contrast to the dramatic bias toward transitional substitutions in
animal mtDNA, transitions are generally found only one-half to
twice as often as transversions in interspecific comparisons of
nuclear genes (Vogel and Kopun 1977; van Ooyen et al 1979; Fitch
1980; Gojobori et al. 1982). A major difference between mtDNA and
nuclear DNA is that the latter has several enzymatic systems for
DNA repair, whereas the former has none in itself, although
mitochondria have a limited importation of repair enzymes from the
cytoplasm (Myers et al. 1988; Satoh et al. 1988). For this reason,
s/v ratio is expected to be higher in protein-coding genes of mtDNA
than in those of nuclear genomes. This explanation of the
difference in s/v ratio between mtDNA and nuclear DNA, however, is
not unique. One can explain the difference equally well by
supposing strong selection for codon bias in nuclear DNA and little
selection for codon bias in mtDNA.
c. GC content:
Selection for biased GC content can also affect s/v ratio. Suppose
an extreme situation in which selection for GC pairs is so strong
that the gene is made entirely of GC. Now if a G or C mutates into
an A or T, the mutant will be eliminated by purifying selection. So
the only visible and possible substitution is a G by a C or a C by
a G, both being transversions. This implies that the s/v ratio will
simply be zero because transitions are selectively eliminated.
Thus, genes with extreme bias in GC content are expected to exhibit
a low s/v ratio.
3. Transition bias at nondegenerate sites.
One might think that s/v ratio at nondegenerate sites is something
simpler than that at two-fold degenerate. It turned out that the
scenario is much more complicated and will have to wait for next
posting.
Xuhua
--
=======================================================================
Xuhua Xia |
Museum of Natural Science | Phone: (504) 388-2841
119 Foster Hall | Fax : (504) 388-3075
Louisiana State University | Email: xuhua at unix1.sncc.lsu.edu
Baton Rouge, LA 70803 |
USA |
=======================================================================