Vector contamination in seq. data (Self-righteous tirade)

Dan Jacobson danj at
Sat Jul 17 08:44:45 EST 1993

In article <CAA1Jq.9t8 at> frist at () writes:
>I just tried to retrieve the Lambda phage sequence from
>retrieve at  I got back everything BUT Lambda!
>The reason is that vector contamination is now annotated in GenBank,
>and the email server performs a search on all annotation, you get any
>entry that in any way refers to the sequence you requested. So when I
>tried retrieving with Lambda's unique accession number (J02459) I got
>so many vector-contaminated sequences that it exceeded the line limit that
>I had set at what I thought was a conservatively high figure, before it
>even got to Lambda itself!
>The purpose of this posting is not to gripe at NCBI. I have already sent
>suggestions to them on how I think this problem might be overcome. What 
>I want to do here is vent my spleen about sloppy sequencing!
>How can you not know that your sequence contains vector? You know who you
>are; yes I'm talking to YOU! Yes you over there with the 32P all over 
>your hands -- are you so sloppy that you don't even bother to look
>for the BEGINNNING of your insert on each and every sequencing run?
> .....

I agree with Brian that vector contamination in the databases is a problem,
and that researchers should be more careful about the data that they submit
(and that some serious spleen venting is in order).

For retrieval though in the past year and a half or so it has gotten easier 
to pull sequences out of the databases.  A gopher search of genbank takes a 
matter of seconds to do, if your first search was to broad then you just 
refine it a little and search again.  
In this case a simple search like:

lambda and phage and genome

pulls the entry of interest right out.  Or if you do have the accession
number - a search for 


also pulls the entry of interest right out in a matter of seconds.

Best of luck,

Dan Jacobson

danj at

More information about the Methods mailing list