difference between EMBL & GenBank

Tim Cutts tjrc1 at mole.bio.cam.ac.uk
Thu Jun 18 05:16:09 EST 1998

In article <tbu-1806980939210001 at tburmei.dialup.fu-berlin.de>,
Thomas Burmeister <tbu at gmx.net> wrote:
>Dear newsgroup-readers!
>I do not understand the difference betweeen EMBL & GenBank. What exactly
>is the difference? Should I search both to get the most up-to-date
>(I hope this question is not too trivial! I apalogize for this.)
>Thanks for any answer (please send a copy to my email-address: tbu at gmx.net) !

They are just different places to which researchers can submit their
data.  They are two of three principal DNA sequence databases (there's
a third one in Japan).

All of these three pass their sequence information between each other,
but of course they will always contain some sequences not present in
the other two.

The database entries have 'accession numbers' which are unique to that
entry (and will be the same for that entry in each database).  You can
use these, therefore, to construct non-redundant databases comprising
all three major databases.  This is clearly much more effective (in
terms of both disk space usage and searching time) than having
multiple copies of 95% of the sequences.

Using GCG, I keep a full copy of EMBL, and create a list of the
accession numbers in EMBL using the program using 'accessionnumbers'.
Then, when formatting GenBank, I can give this list of
accession numbers to genbanktogcg, which automatically excludes any
entry in Genbank which I already have a copy of in EMBL.

My GenEMBL data farms in GCG therefore have everything in GenBank and
EMBL, without any sequence duplication.


Dr T J R Cutts                                        Tel: +44 1223 333596
Dept. of Biochemistry, 80 Tennis Court Rd.
Cambridge, CB2 1GA, UK

More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net