Extraction of portions of GenBank flatfiles

Benedict Arnold bca at nildram.co.uk
Thu Jan 30 10:02:50 EST 2003


Hi Constantinos,

yes, this is quite easy to do using SRS ...

an example using the following URL :

http://iubio.bio.indiana.edu/srs6bin/cgi-bin/wgetz?-id+7KUJq1KWZKD+-e+[GENBANK:'AB013077']

which returns an azurin sequence entry from genbank ( you'll probably have to add a final ] to the URL, cut and paste mressing about). If you scroll down the entry page you will notice that there are section with CDS, source, gene etc ... if you actually click on the CDS (it is actually a hyperlink), you get a page containing just that sub-section of the complete entry. Due to some clever code in SRS, the sequence associated with this section of the entry is the sequence between the CDS sequence positions (which I thibk is what you're after). Then it's just a matter of cutting and pasting that sequence into whatever application or separate file you want.

hope that helps

Benedict Arnold

>Is anyone aware of any software (not commercial), that can be used for 
>extraction of portions of the 
>flatfiles ? (I think that's how they call the GenBank entries you get 
>after a search) . For example if in a database flatfile entry, 
>you have a reference to coding sequence as " CDS : 235 ... 1500 bp" , is 
>there a software that can find the keyword "CDS" 
>in the flatfile, and then read and return the string composed of the 
>letters a c g t, that is between the numbers 235...1500 
>in the sequence at the end of the file ? I am particularly interested, 
>to extract promoter regions from whole gene entries of 
>GenBank. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://iubio.bio.indiana.edu/bionet/mm/bio-srs/attachments/20030130/1deb02ca/attachment.html


More information about the Bio-srs mailing list