IUBio

Human DNA on a computer hard disk

Tony Schountz schountzwa at bioax1.bio.ornl.gov
Mon May 18 09:25:53 EST 1998


Leonard Bardi wrote:

> I am a student in Information Systems at the university of Geneva.
>
> I'd like to know how many hard disk space would be necessary to store a
> entire human genome.
>
> If I have well understood, in all our 46 chromosomes we have 3 * 10^9
> bases. Each base is an A, C, G or T.
>
> Are they other elements that should be stored? Would it be enough?
>
> So, a human genome would take less than 1 GB disk space :
>     - 4 letters can be encoded on 2 bits
>     - 2 * (3 * 10^9) / 8 / 1024 / 1024 =  715 MB
>
> Is that right ???

 Well, sort of.  You have to keep in mind all the polymorphisms in the human
population.   For example, there are over 100 Hb alpha alleles.  Also, the MHC
is represented by dozens of alleles at each locus.  Which ones are the "right"
ones?

Additionally, many times there are repeats found in genomic sequences.  If you
come across these repeats, you could squeeze them into a smaller amount of space
on the drive, in a way similar to how data and image files are compressed. How
much space would be required to store this sequence:

TTTTTTTTTTTTTTTT

 Without compression: 2 bits per  "T"  x 16 = 4 bytes (32 bits).  With
compression: 2 bits for T,  4 bits to let the compression program know it occurs
16 times = 6 bits.  Of course, now you consume processor time for the algorithm
encoding/decoding the compression.




More information about the Microbio mailing list

Send comments to us at biosci-help [At] net.bio.net