Conrad Halling chh9 at kimbark.uchicago.edu
Fri Jul 23 12:05:09 EST 1993

In article <16C10F1D6.A428ENDE at HASARA11.SARA.NL> 
     A428ENDE at HASARA11.SARA.NL writes:

>Because I am not familiar with the GCG programs I wrote my own program to 
>parse the feature table of sequences and so tabulate the codon usage in the 
>coding strand. This works well but I found out that there is a strange 
>discrepancy between sequences. About half of them contained in the CDS 
>parameters also the stop codon, but the other half stopped just before the 
>stop codon. Why is this?

I asked this question myself a few months ago because I, too, have written a 
program that counts codons.

The answer is that EMBL's old definition of the CDS (coding sequence) did NOT
include the stop codon.  The new standard definition does.  Unfortunately,
EMBL is behind on fixing the old sequences.  Many of these sequences have
been transferred to GenBank, so there are some GenBank entries for which
the CDS feature does not include the stop codon.

The only solution for now is to count the codons, then see if you have
a stop codon.  If you don't, check the next codon after the end to see
if it's a stop codon.

As a warning, you will have to check to be sure that there is only one
stop codon in a CDS range.  You should also check that when you have one
stop codon that it is at the end of the CDS range.  There are several
entries in which the CDS range is incorrect.

Conrad Halling
c-halling at uchicago.edu

More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net