Phillip San Miguel wrote:
>> Is there a publically available program or script that will
> count the number of Phred 20 bases (that is, the number of
> bases with quality scores of 20 or higher) for each sequence
> in a quality file generated by Phred?
you might want to contact Brent Ewing at UW about a program called
qrep. I think this gives the kind of output you want.
>> I have a couple of other questions about Phred: what are the
> scores generated by the -qr and how is a "High Quality Base"
> defined?
high quality bases are those with a phred q value of more than 20. I
think the numbers in the -qr report are the number of lanes that have
q>20 scores for the number of bases shown on in the first column.
> I've noticed that Phred will generate a histogram
> of scores of some sort using the -qr qualifier.
I don't get a histogram with version of phred ..... -qr <filename> that
I am using, but I get three when I run qrep. The first is the percent
of the dataset that have x number of total bases, the second is the
percent of dataset that have q values listed in the first column, and
the third is the percent of the reads in the dataset that have x quality
values. You also get the average total length of the read, the total
bases read (sum of total bases per lane), and the average number of high
quality (q>20) bases in the dataset.
You can find most of this at
http://bozeman.mbt.washington.edu/phrap.docs/phred.html
hope that helps. I think that the
srlasky
--
Stephen R. Lasky, Ph.D. #
University of Washington #
Department of Molecular Biotechnology #
HSB K324 box 357730 #
Seattle, WA, 98195 USA #
email: srlasky at u.washington.edu #
Phone: 206-616-5865 #
Fax: 206-685-7301 #
#########################################