# statistical question on sequencing project

Tom Anderson via methods%40net.bio.net (by ucgatan from ucl.ac.uk)
Mon Jul 28 05:17:29 EST 2008

```On Sun, 27 Jul 2008, WS wrote:

> Dear Experts,

That's not me, but i'll contribute my P = 0.02 anyway ...

> stupid scientist wants to quantify some gene expression in his samples
> with pyrosequencing. The technology he plans to use offers him the
> possibility to identify a certain number (N) of transcripts per sample
> (at least 10.000).
>
> He has some ideas on how many different transcripts there might be
> (about 100) in one sample. Now, he is looking for a method that tells
> him how to calculate the number of sequences he has to obtain per sample
> in order to identify transcripts of which the expression is different
> from sample to sample.
>
> Example: If there is a transcript present at 1% in one sample, how many
> transcripts does he need to identify and count in order to detect a
> change in its relative expression of e.g. 100% (i.e. presence of 2% or
> 0.5% in another sample with a certain probability (say p<0.05)?
>
> statistical problem is named etc.)

I *think* this is a matter of what's called 'statistical power':

http://en.wikipedia.org/wiki/Statistical_power

However, this is a fairly complicated subject, and i can't really tell you
any more than that. You could try the sci.math.stat newsgroup, or asking

I suspect you might need to know something about the error characteristics
of the counting method, which will mean running each sample multiple times
and calculating mean and standard deviation or similar.

tom

--
Tom Anderson, MRC Laboratory for Molecular Cell Biology, UCL, London WC1E 6BT
(t) +44 (20) 76797264   (f) +44 (20) 76797805   (e) thomas.anderson from ucl.ac.uk

```