new Genbank slows srs indexer to crawl

Thure Etzold etzold
Thu Jun 15 11:44:00 EST 1995


gilbertd at sunflower.bio.indiana.edu (Don Gilbert) wrote:
>
>
>The latest release of GenBank has put SRSbuild here into a slow
>crawl.  It has been trying to index this release for over 48 hours.
>It makes progress, but at a snail's pace.  My guess is that
>this is a memory-limit problem.  The previous GenBank build on this
>machine took probably less than 6 hours.
>
>The machine that is indexing has (merely) 32MB of real memory, and 
>about 100mb of virtual memory.  Is this slowness likely due to
>srs having to use virtual memory extensively?  Or might this indicate
>some other problem?  There are some machines here w/ more memory,
>but it will be harder for me to run srsindex on them.
>

The SRS indexing needs physical memory ...performance decreases enormously
if the process starts swapping! The memory consumption can be decreased by
indexing all indices for genbank in several rounds.

The 'srscheck' command has an option '-s' that lets you limit the memory
requiremnt (number of kb) ...the size of all the indices is computed
by using values in genbank.sdl "/maxIndexSizeKb" and "/alloc_factor=0.8"
..this numbers of course are not always being updated and give only
a rough estimate ...but it is possible to force srscheck to split 
index building into separate rounds. Here is an example:

> srscheck -l genbank -xdir /trash/etzold/ -o /dev/tty -s 100000

gives me

srsbuild GENBANK -f ' ID DAT ACC DEF KEY ORG AUT TIT REF FTS SL' -xdir \
'/trash/etzold/' -odir 'SRSINX:' -env 'unix'



> srscheck -l genbank -xdir /trash/etzold/ -o /dev/tty -s 10000

gives

srsbuild GENBANK -f ' ID DAT ACC' -xdir '/trash/etzold/' -odir 'SRSINX:' -env \
'unix'
srsbuild GENBANK -f ' DEF KEY ORG' -xdir '/trash/etzold/' -odir 'SRSINX:' -env \
'unix'
srsbuild GENBANK -f ' AUT TIT REF' -xdir '/trash/etzold/' -odir 'SRSINX:' -env \
'unix'
srsbuild GENBANK -f ' FTS SL' -xdir '/trash/etzold/' -odir 'SRSINX:' -env 'unix'


note that for the index compression is ALWAYS done for a single index!


>Another question: if I index just genbank on a different machine,
>is it likely that I will run into similar problems if I run srsbuild
>to do the cross-indexing on this original server?
>

the index-building needs much less memory and has never been a problem sofar.

regards
Thure


===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold at embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany





More information about the Bio-srs mailing list