What Genome have been Sequenced?
gsbs1022 at UTSPH.SPH.UTH.TMC.EDU
Tue Oct 13 08:27:01 EST 1992
robison1 at husc10.harvard.edu (Keith Robison)
>>As Monte Carlo simulation shows, about 24 of these ORF are expected
>>to be found by chance (in a random sequence of length 300,000 with
>>the same base frequencies as in yeast III chromosome: A=0.31, T=0.30,
>>G=0.19, and C=0.20).
> Curiosity: in the ChrIII paper, the claim was made that
>ORFs of >100 amino acids "have 0.2% probability of occurring by
>chance in S.cerevisiae DNA." Is this consisistent with the above
>Reference ginven (I haven't looked it up yet)
> Sharp & Crowe. Yeast (1991) 7:657-678.
I didn't read the paper, but the Monte Carlo estimate
can be simply supported by the following consideration:
The average ORF length is 64/3 = 21.3 (for equal base contens)
The expected number of ORFs (of any size including zero length)
is 100,000/21.3 = 4687.5
The probability of ORF of size L is p(1-p)^L where p=3/64
The probability of ORF of size L>=100 is (1-p)^100 = 0.008222163
The expected number of long ORFs (L>=100) is 4687.5*0.008222163 = 38.54
Considering the complementary chain doubles this amount, 77.
This is quite close to the Monte Carlo estimation, 24*2 = 48
The difference might be due to unequal usage of bases, boundary effects,
including/excluding stop codon to ORF etc.
So, the probability of finding this long ORFs is not too small
as it could seem at the first glance.
More information about the Bio-soft