Software to extract annotation fields from EMBL/GenBank entries.

Santi Garcia Vallve vallve at quimica.urv.es
Mon Jun 10 13:20:57 EST 1996


Brian Robertson wrote:
>
> The amount of bacterial genome data available as sequenced cosmids of
> 30-40 kb is increasing rapidly. Our problem is that we need to keep track
> of newly discovered genes as they appear, so they can be incorporated into
> our research program as appropriate. For this we need to create lists of
> probable genes identified in the annotations for each cosmid. This can
> then be circulated to laboratory workers.
>
> An example of this kind of annotation is shown below. We would like to
> extract the "/note" field, which contains the probable function of the
> gene, and create a list of these for each cosmid.
>
> FT   CDS_pept        complement(3043..4155)
> FT                   /note="MTCY190.03c, probable anthranilate
> FT                   phosphoribosyltransferase, trpD, len: 370, similar to eg
> FT                   SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
> FT                   initiation codon uncertain, gtg at 4086 favoured by
> FT                   homology but this has no clear ribosome binding site"
>
> Does anyone know of a way of extracting this information from database
> entries and creating a list? Is there any software avaialable that has
> this as one of its options, or would a shell script be needed?

If you have the entries in your own computer, I think the best 
solution is make an own program to do it. I have done a similar one 
in FORTRAN77 and is not difficult.




More information about the Bio-soft mailing list