Phred vs. ABI base caller

Phillip San Miguel pmiguel at
Tue Dec 7 05:31:15 EST 1999

    I ran the ABI sequencing standards ("BigDye Terminator Sequencing
Standard with AmpliTaq DNA polymerase, FS") on our 3700 recently. I
brought up 3 tubes of the standard in formamide, mixed them and
aliquoted them into a 96 well PE plate, denatured at 95 oC for 5 minutes
on a PE 9700 thermal cycler and chilled to 4 oC. I centrifuged the plate
for 5 minutes in a Beckman Allegra 6R and placed them on the deck to be
run on the 3700 using its standard sequencing module (except with the
sensors turned off.) These samples were run 4 times. Call the first run
"0 hrs", the second, third and fourth runs were 4.2 hrs, 8.2 hrs, and
about 57 hours later. I deal only with the first and third runs here
except for a few comments below.

    I tested the sequences generated by the ABI basecaller and those
generated by Phred with the "Autoscore" program included in the 3700
software. I set the "tolerance score index" to 1% to emulate Phred q>20
scores. The scores below are the number of bases at 99% accuracy. This
is actual accuracy since autoscore compares the test sequences to the
standard sequence.

               1st run               3rd run
   median      781  bases            738
   mean        724                   607
   std dev.    173                   261
   median      789                   716
   mean        698                   552
   std dev.    242                   312

    Two points here. The 1st is that Phred does a good job of calling
bases even using chromatograms generated from samples loaded in
formamide and run at 50 oC on a machine that didn't exist when Phred was
written. The 2nd point is that the decrease in sequence quality over 8
hours is not enormous. I guess there is a 3rd point--excellent sequence
from the ABI sequence standard! An average of 724 bases called at 99%
    How did Phred rate its own ability to call bases with these
chromatograms? Here are the Phred q>20 (that is, number of bases with a
phred score of 20 or higher.):

Phred q>20
   median      445  bases            343
   mean        400                   325
   std dev.    112                   105

    Even though Phred did a good job at calling bases, it doesn't place
much confidence in its calls. On average it called 698 bases at 99%
accuracy, but only gives itself credit for having called 445 bases.

    The other runs.
    The samples run at 4 hours were intermediate in quality to the those
run at 0 and 8 hrs. Nothing surprising there.
    The samples run at 57 hours (that is after 57 hours at room temp,
open to the air) basically gave no sequence. Examination of the
chromatograms revealed a high degree of degradation (broadening) in
their "G" peaks. The A, C and T peaks looked okay. Since the molecules
present in the A, C and T peaks contain plenty of "G" bases, I would
guess that this chemistry problem has nothing to do with "G", per se,
but rather something to do with the fluor with which "G" is labeled.
    The next obvious experiment was to run water loaded samples. I
attempted to do this but my 3700 fatal errored out 2 nights in a row on
the 1st run of the night. I guess I need a new bottle of polymer. Alas
the sequencing standard has been at room temp, open to the air, for two
nights and it won't be a fair comparison. Last time I checked this
standard cost about $2000, so I don't see this experiment getting done.
    Please post any comments you have.
Phillip San Miguel
Purdue Genomics Center

