question about SRS5.0

Peter Rice pmr at sanger.ac.uk
Tue Dec 10 04:43:30 EST 1996


In article <32ACBDA7.57D9066B at pku.edu.cn> Tao Jiang <jiangt at pku.edu.cn> writes:
>   I downloaded the SRS version 5.0, and had a small test
>   yesterday. I installed only a very small part of EMBL
>   : phg.dat, and try to index it. To my surprising, I
>   found it would take me over 12 hours to index it (in fact,
>   the indexing hasn't finished yet), and too many index
>   files were generated: embl_fts_1.inx, embl_fts_2.inx, ...
>   , embl_569.inx, ... (I have modified srsdb.i and embl.i to
>   include only phg.dat of EMBL).
>
>   However, if I test it with swissprot, it seems to work well
>   (only one index file for every index).

The multiple index files are not a bug - they are a feature of SRS 5.0
(and a very welcome feature too).

SRS 4.0 needed to index all of EMBL in one go, which consumed terrible
amounts of memory.

SRS 5.0 splits EMBL into smaller chunks (defined in embl.i as
partSize:100000 for 100000 entries), and builds a separate index for
each.  There is a new "srsbuild -m" step which merges these smaller
indices together once they have been compressed.

Having said that, I seem to have problems with the EMBL indexing
too, so I am still checking on what is happening. After 82500+ entries
from est1, I get:

> SRSDB:embl.is:70: error: insufficient memory - error during malloc,
> could not allocate "Cursor object"

This was indexing the EMBL flat files.

Of course, we have both been editing the embl.i file (and some others)
so maybe we did something wrong.
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division,
E-mail: pmr at sanger.ac.uk             | The Sanger Centre,
Tel: (44) 1223 494967                | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919                | Hinxton, Cambridge, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/   | England




More information about the Bio-srs mailing list