Program to count Phred 20 bases

Stephen R. Lasky srlasky at u.washington.edu
Mon Oct 18 04:45:32 EST 1999


Phillip San Miguel wrote:
> 
> Is there a publically available program or script that will
> count the number of Phred 20 bases (that is, the number of
> bases with quality scores of 20 or higher) for each sequence
> in a quality file generated by Phred?

you might want to contact Brent Ewing at UW about a program called
qrep.  I think this gives the kind of output you want.

> 
> I have a couple of other questions about Phred: what are the
> scores generated by the -qr and how is a "High Quality Base"
> defined?  

high quality bases are those with a phred q value of more than 20.  I
think the numbers in the -qr report are the number of lanes that have
q>20 scores for the number of bases shown on in the first column.

> I've noticed that Phred will generate a histogram
> of scores of some sort using the -qr qualifier. 

I don't get a histogram with version of phred ..... -qr <filename> that
I am using, but I get three when I run qrep.  The first is the percent
of the dataset that have x number of total bases, the second is the
percent of dataset that have q values listed in the first column, and
the third is the percent of the reads in the dataset that have x quality
values.  You also get the average total length of the read, the total
bases read (sum of total bases per lane), and the average number of high
quality (q>20) bases in the dataset.

You can find most of this at
http://bozeman.mbt.washington.edu/phrap.docs/phred.html


hope that helps.  I think that the

srlasky

-- 
Stephen R. Lasky, Ph.D.			#
University of Washington		#
Department of Molecular Biotechnology	#
HSB K324 box 357730			#
Seattle, WA, 98195 USA			#
email:	srlasky at u.washington.edu	#
Phone: 	206-616-5865			#
Fax:	206-685-7301			#
#########################################




More information about the Autoseq mailing list