total number of bases?

EMBnet Switzerland embnet at comp.bioz.unibas.ch
Wed May 4 16:16:07 EST 1994


...
: > I can't seem to find a reliable current estimate of the total 
: > number of bases in all the different sequences stored in 
: > readily accessible databases (genbank, embl...). Also, 

Stephane, 
the number of sequences which you have on AEOLUS (your computer running GCG) 
should be the following: 
EMBL 
       894 bb.seq
     16665 em_ba.seq
     33315 em_est.seq
      5676 em_fun.seq
     10873 em_in.seq
      5139 em_om.seq
      5462 em_or.seq
      5599 em_ov.seq
       967 em_ph.seq
      8005 em_pl.seq
     30542 em_pr.seq
     19695 em_ro.seq
      5318 em_sy.seq
      4872 em_un.seq
     15649 em_vi.seq
      3116 patent.seq
    171787 total

GENBANK exclusion set (GENBANK 82 - EMBL 38 with GCG) 
       212 gb_ba.seq
       356 gb_est.seq
       220 gb_in.seq
        97 gb_om.seq
       111 gb_ov.seq
         0 gb_pat.seq
         1 gb_ph.seq
       223 gb_pl.seq
       657 gb_pr.seq
       250 gb_ro.seq
        66 gb_st.seq
        28 gb_sy.seq
        17 gb_un.seq
       274 gb_vi.seq
         1 gbphg.seq
      2513 total

and the weekly updates from EMBnet Switzerland 

       901 gb_new.seq   - all new GENBANK not in the EMBL updates 
      7717 xembl.seq    - all really new EMBL entries 
      7250 xxembl.seq 	- all entries updated by EMBL wrt last release 

I wouldn't use the basepair numbers, though, as mentioned below, 
for statistics as the data are based on ACCESSION numbers and therefore 
get you a lot of redundancies. 


: The size of current releases of GenBank and EMBL is:
: GenBank Release 82 (15 April 1994): 180,589,455 bases; 169,896 sequences;
: EMBL Library Release 38  (March 1994): 179,346,566 bases; 171,787 sequences;

: The NCBI maintains a non-redundant database daily updated
:   nr    Non-redundant PDB+GBUpdate+GenBank+EmblUpdate+EMBL:

: 5:07 AM EDT May 3 1994: 184,980,203 bases; 173,749 sequences;


The HASSLE server of EMBnet Switzerland recalculates a 'nr' for both 
proteins and DNA on a weekly basis. Last saturday we had, based on 
EMBL with Genbank added, and EMBL updates with Genbank updates added, 
slightly less than the data reported above (but  this was from April 30). 

(specifically to Stephane) 
Unfortunately, the host you use runs a TCP/IP product which doesn't 
support HASSLE at the moment (Wollongong), but times may come where 
you support UCX, Multinet, or TCPware. If you need an account on the 
EMBnet Switzerland UNIX cluster, let me know. 
(for all) 
HASSLE is available for most flavours of UNIX and the VMS emulations 
of IP mentioned above. Contact us for details - both customer and server 
mode are supported in full source. Services within EMBnet running via 
HASSLE are BLAST, (T)FASTA, PROFILE and S&W search (via the Biocellerator
at EMBnet Israel at Weizmann/Rehovot) and MOWSE (from EMBnet UK at Daresbury). 
MEDFETCH is a first SRS type gateway to ENTREz-based Swissprot, and 
FETCH gets database entries in GCG format. 



Regards
Reinhard Doelz
EMBnet Switzerland

-- 
 
+----------------------------------+-------------------------------------+
|     EMBnet SWITZERLAND           | RFC     embnet at comp.bioz.unibas.ch  |
|      Biocomputing                | (small) FTP and GOPHER server       |



More information about the Embl-db mailing list