total number of bases?
EMBnet Switzerland
embnet at comp.bioz.unibas.ch
Wed May 4 16:16:07 EST 1994
...
: > I can't seem to find a reliable current estimate of the total
: > number of bases in all the different sequences stored in
: > readily accessible databases (genbank, embl...). Also,
Stephane,
the number of sequences which you have on AEOLUS (your computer running GCG)
should be the following:
EMBL
894 bb.seq
16665 em_ba.seq
33315 em_est.seq
5676 em_fun.seq
10873 em_in.seq
5139 em_om.seq
5462 em_or.seq
5599 em_ov.seq
967 em_ph.seq
8005 em_pl.seq
30542 em_pr.seq
19695 em_ro.seq
5318 em_sy.seq
4872 em_un.seq
15649 em_vi.seq
3116 patent.seq
171787 total
GENBANK exclusion set (GENBANK 82 - EMBL 38 with GCG)
212 gb_ba.seq
356 gb_est.seq
220 gb_in.seq
97 gb_om.seq
111 gb_ov.seq
0 gb_pat.seq
1 gb_ph.seq
223 gb_pl.seq
657 gb_pr.seq
250 gb_ro.seq
66 gb_st.seq
28 gb_sy.seq
17 gb_un.seq
274 gb_vi.seq
1 gbphg.seq
2513 total
and the weekly updates from EMBnet Switzerland
901 gb_new.seq - all new GENBANK not in the EMBL updates
7717 xembl.seq - all really new EMBL entries
7250 xxembl.seq - all entries updated by EMBL wrt last release
I wouldn't use the basepair numbers, though, as mentioned below,
for statistics as the data are based on ACCESSION numbers and therefore
get you a lot of redundancies.
: The size of current releases of GenBank and EMBL is:
: GenBank Release 82 (15 April 1994): 180,589,455 bases; 169,896 sequences;
: EMBL Library Release 38 (March 1994): 179,346,566 bases; 171,787 sequences;
: The NCBI maintains a non-redundant database daily updated
: nr Non-redundant PDB+GBUpdate+GenBank+EmblUpdate+EMBL:
: 5:07 AM EDT May 3 1994: 184,980,203 bases; 173,749 sequences;
The HASSLE server of EMBnet Switzerland recalculates a 'nr' for both
proteins and DNA on a weekly basis. Last saturday we had, based on
EMBL with Genbank added, and EMBL updates with Genbank updates added,
slightly less than the data reported above (but this was from April 30).
(specifically to Stephane)
Unfortunately, the host you use runs a TCP/IP product which doesn't
support HASSLE at the moment (Wollongong), but times may come where
you support UCX, Multinet, or TCPware. If you need an account on the
EMBnet Switzerland UNIX cluster, let me know.
(for all)
HASSLE is available for most flavours of UNIX and the VMS emulations
of IP mentioned above. Contact us for details - both customer and server
mode are supported in full source. Services within EMBnet running via
HASSLE are BLAST, (T)FASTA, PROFILE and S&W search (via the Biocellerator
at EMBnet Israel at Weizmann/Rehovot) and MOWSE (from EMBnet UK at Daresbury).
MEDFETCH is a first SRS type gateway to ENTREz-based Swissprot, and
FETCH gets database entries in GCG format.
Regards
Reinhard Doelz
EMBnet Switzerland
--
+----------------------------------+-------------------------------------+
| EMBnet SWITZERLAND | RFC embnet at comp.bioz.unibas.ch |
| Biocomputing | (small) FTP and GOPHER server |
More information about the Embl-db
mailing list