Software to extract annotation fields from EMBL/GenBank entries.

Brian Fristensky frist at cc.umanitoba.ca
Fri Jun 7 12:35:34 EST 1996


Brian Robertson wrote:
> 
> The amount of bacterial genome data available as sequenced cosmids of
> 30-40 kb is increasing rapidly. Our problem is that we need to keep track
> of newly discovered genes as they appear, so they can be incorporated into
> our research program as appropriate. For this we need to create lists of
> probable genes identified in the annotations for each cosmid. This can
> then be circulated to laboratory workers.
> 
> An example of this kind of annotation is shown below. We would like to
> extract the "/note" field, which contains the probable function of the
> gene, and create a list of these for each cosmid.
> 
> FT   CDS_pept        complement(3043..4155)
> FT                   /note="MTCY190.03c, probable anthranilate
> FT                   phosphoribosyltransferase, trpD, len: 370, similar to eg
> FT                   SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
> FT                   initiation codon uncertain, gtg at 4086 favoured by
> FT                   homology but this has no clear ribosome binding site"
> 
> Does anyone know of a way of extracting this information from database
> entries and creating a list? Is there any software avaialable that has
> this as one of its options, or would a shell script be needed?

You might try the FEATURES program from the XYLEM package, which was
described in

      Fristensky, B. (1993) Feature expressions: creating and manipulating 
      sequence datasets. Nucl. Acids Res. 21:5997-6003.

FEATURES is a program that can read GenBank Features Tables and 
extract the corresponding sequences, Feature expressions, and
annotation lines. FEATURES is a Unix program, which can be run from
the command line, as a text-based interactive program, or from a GDE menu.

To see an example of how FEATURES works, and to retrieve the XYLEM package, 
see

http://home.cc.umanitoba.ca/~psgendb/FEATURES.html

XYLEM can also be downloaded from directory 'psgendb' at ftp.cc.umanitoba.ca.

===============================================================================
Brian Fristensky                |  
Department of Plant Science     |  Best advice I've heard in a long time:
University of Manitoba          |  
Winnipeg, MB R3T 2N2  CANADA    |  "Don't confuse having a career with
frist at cc.umanitoba.ca           |   having a life."
Office phone:   204-474-6085    |  
FAX:            204-261-5732    |
http://home.cc.umanitoba.ca/~frist/
===============================================================================




More information about the Bio-soft mailing list