Phred scores and Multiple/ Consensus reads

jkb at mrc-lmb.cam.ac.uk jkb at mrc-lmb.cam.ac.uk
Thu Mar 21 07:45:27 EST 2002


In <a6r0l9$1p9$1 at mercury.hgmp.mrc.ac.uk> Laurence Hall <hall at RUBICONGENOMICS.com> writes:

> Given that each base has at Phred 20, by definition, a 1% chance of
> being wrong is the error rate for the resulting concensus 1% of 1%, in
> other words 1 in 10,000 that is q40 ?? If not how is the resulting Phred
> score computed ??

Adding the score isn't entirely the answer, even if we assume the two strands
are entirely independent experiments. What if one strand states A and the
other T? Do we then subtract them?

Our solution (in the Staden Package) is to use simple Bayesian statistics with
a flat prior. Ie if a base is confidence 20 then we need to take into account
that it has probability .99 and the other three have probability .00333.

Hence two agreeing strands give:

.99*.99 / (.99*.99 + 3*.00333*.00333) = .999966
(phred score 44.7)

Conversly if we have A at .99 and C at .99 for the two strands then we get:

A/C = .99*.00333 / (.99*.00333*2 + .00333*.00333*2) = .498

That makes sense - almost half chance of A, half chance of C, and a miniscule
chance that it's something else.

James

--
James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Fax: (+44) 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

---




More information about the Autoseq mailing list