I am supervising a student project to examine if pattern analysis
of DNA can be used to automatically derive taxonomies of organisms.
Amongst the patterns my student, Attilla Ting, is investigating are
n-gram models and codon variation.
As part of this process Attilla has developed a Perl script that takes
raw DNA data as input and calculates where the reading frames are. She
would like to validate her code. The easiest way to do this would be to
find a publically evailable source of DNA with the reading frames
annotated. Even better would be a site that has the raw DNA separate
to a file containing the extracted reading frames. She could then test
her code directly. Does anyone know where we can get hold of the
data we need?
Attilla has been using GENBANK to acquire her DNA data. We would also
welcome information on sources of raw DNA data that people would
consider useful (ideally we need complete DNA sequences) or any other
Either reply direct to Attilla at cayakt at scs.leeds.ac.uk or to me.
Dr John Hughes
School of Computer Studies
University of Leeds, UK