I,m trying to collate DNA data from a particular bacterium.
When calculating GC content, is it wise to take only coding
sequence??(There is a marked codon usage bias). It would
seem that taking a large amount of flanking DNA could unduly
bias the numbers, eg alternating py tracts, terminator seqs.
I realise that gross figures for the whole genome are
sometimes quoted (from physical methods), but what is the consensus
of deriving the number from sequence data- surely the constraints
are only selected for in the coding regions? Can one thus include
non-translated RNAs in the analysis?
Any opinions welcomed
University of York