Help with Raw Genebank Data

Anders Gorm Pedersen gorm at cbs.dtu.dk
Wed Jul 12 08:31:45 EST 2000


tanios at my-deja.com wrote:

> I Want to calculate some stats with datas from genebank. When I
> download data from Genebank, all are in Raw format (the number of
> fields are different), so I cannot use them directly.
> Do you know if I can find a software which can manipulate this raw
> datas (not only the sequences, but all the fields). I try to program a
> Word Macro.. it's a little bit hard.

If you really want to play around with the data, then I warmly recommend
moving to a UNIX (or Linux) platform, and using for instance "nawk" and
"perl" scripts. It's not that hard to extract the sequence and other
relevant features once you get familiar with how genbank files are
structured (I suggest you have a look at the release notes which explain
feature keys etc: ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt)

IF, however, you don't really want to learn to program (I can't blame
you), then you may find that the GCG package can help you some of the
way (it contains tools for converting between different sequence
formats, and for doing various statistics, e.g.,
nucleotide/dinucleotide/trinucleotide composition, frequently occurring
"words", etc.). 

Good luck with your project!

Anders Gorm Pedersen


---------------------------------------------------------------
 Anders Gorm Pedersen, cand.scient., Ph.D.  (gorm at cbs.dtu.dk)

 Center for Biological Sequence Analysis
 Technical University of Denmark
 Bldg 208, DK-2800 Lyngby, Denmark

 phone: (+45) 45 25 24 84
 fax:   (+45) 45 93 15 85

 Web: http://www.cbs.dtu.dk/gorm/
---------------------------------------------------------------







More information about the Dros mailing list