The amount of bacterial genome data available as sequenced cosmids of
30-40 kb is increasing rapidly. Our problem is that we need to keep track
of newly discovered genes as they appear, so they can be incorporated into
our research program as appropriate. For this we need to create lists of
probable genes identified in the annotations for each cosmid. This can
then be circulated to laboratory workers.
An example of this kind of annotation is shown below. We would like to
extract the "/note" field, which contains the probable function of the
gene, and create a list of these for each cosmid.
FT CDS_pept complement(3043..4155)
FT /note="MTCY190.03c, probable anthranilate
FT phosphoribosyltransferase, trpD, len: 370, similar to eg
FT SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
FT initiation codon uncertain, gtg at 4086 favoured by
FT homology but this has no clear ribosome binding site"
Does anyone know of a way of extracting this information from database
entries and creating a list? Is there any software avaialable that has
this as one of its options, or would a shell script be needed?
If a shell script is required, can anyone help with writing one? I'm
afraid it's beyond my capabilities.....
Thanks for your help.
Dr. Brian D. Robertson
Dept. Medical Microbiology
Imperial College School of Medicine at St Mary's
London W2 1PG, U.K.
b.robertson at ic.ac.uk