In article <CDEz1A.7s3 at usenet.ucs.indiana.edu>
wfischer at bio.indiana.edu (Will Fischer) writes:
>I need to extract given pieces of sequence from a set of EMBL/GenBank
>flat file entries (as retrieved from NCBI's email server), using ranges
>defined in the features table. For example, I'd like to be able to
>extract, from a set of entries, the DNA sequence for each exon, or
>again for every complete CDS feature (all exons assembled).
>>Surely not everyone does this manually?
>>What software exists that can actually parse the (eminently parsible)
>joint features table format? Please post reviews of programs you have
>used, or mail me directly and I will summarize.
>A few months ago I wanted all the CDS from Chlamydomonas sequences. Because I d
idn'T konow of any software I wrote a TurboPascal routine to do it. However the
re are a lot of strange ways in the feature table to give the CDS (I found at l
east five different ways!). Therefor the program isn't straight forward and is
sometimes not very elegant. I think you can understand how it is working, so I
can mail it if you want. It is a unit for TurboPascal 5.0 or greater and works
on a DOS machine.
However don't expect to much because human intervention is still necessary for
some sequences because they have contradictionary feature tables. I first scree
ned for this sequences and then the correct ones were copied and analyzed by th
e same program. Email me if you want the source code.
Henk van de Kamer
CHLAMY at SARA.NL