[Arabidopsis] re: pseudogene or not

Charles S. Gasser via arab-gen%40net.bio.net (by csgasser from ucdavis.edu)
Wed Oct 15 19:01:39 EST 2008

Dear jiangyqcn,

You wrote:

>Hi collegues,
>I recently cloned some genes of Brassica napus by RT-PCR/RACE 
>through mining ESTs in GenBank.

Note - if you did RT-PCR, you did not clone "genes", you cloned 
"cDNAs". This is a very important distinction that you should be 
careful to make.

>The primers were designed based on ESTs. After sequencing more than 
>2 clones for each gene,  I found, for quite a few genes, obviously, 
>alleles exist. The most common is nucleotide substitution, next 
>indel(insertion/deletion). An interesting phenomenon I found is for 
>those alleles with a a long insertion compared to the other allele, 
>the insertion is not 3-folds, i.e. the insertion leads to premature 
>stop codon in the insertion or immedialtely after the insertion. The 
>means that the translated amino acid sequence of one allele is 
>normal (compared to At homolog), while the other is much shorter 
>(premature stop codon).
>My question is why the insertion is not 3-folds and they are, for 
>example 79bp, 100bp, etc? Are the alleles bearing the 79bp, 100bp 
>insertion pseudogenes? BTW, I used high-fidelity polymerase when 
>cloning those genes and, sequenced the clones from two ends with the 
>same sequence, which excludes sequencing errors. Although RTase is 
>error-prone, I have no way to predict this.

What you see is common, most of your data do not likely reflect 
either alleles or pseudogenes.  The base substitutions you see could 
potentially represent alleles, or homeologous genes.  There should be 
two different sequences because B. napus appears to be an 
allotetraploid of B. oleraceae and B. rapa, so you would get 
sequences from each. But is is also quite likely that some 
substitutions represent errors by RT-ase or resulting from PCR.

The insertions are almost certainly retained introns.  These can 
result from either the presence in your RNA samples of nuclear RNA 
precursors that have not yet been fully spliced (notably in many 
cases poly-A addition can happen prior to splicing - so oligo dT will 
still work to prime these transcripts for RT).  In addition, it is 
not uncommon for partially spliced transcripts to erroneously leave 
the nucleus.  If they have internal stop codons they should be 
subject to nonsense-mediated decay and should, in theory, be at 
relatively low frequency, but we and others see such partially 
spliced cDNA products for many different genes.  You can check this 
theory by looking for "GT" at the start of the insert and "AG" at the 
end of the insert, consistent with their being introns.  Also, your 
insert sized of >65 bases are consistent with their being introns. 
Thus, you can get multiple different partially spliced cDNAs from a 
single gene - with no need to invoke the existence of pseudogenes.

For example, for one gene family, in one maize EST database we found 
nearly 90% of the cDNA sequences to include retained introns.

>Could someone provide any explanation?

See above.


You are welcome.

Chuck Gasser
U. C. Davis (and arab-gen moderator)

More information about the Arab-gen mailing list