Apologies for forwarding a raw dialog, but there are several interesting
issues scattered in it. ZmDB is at www.zmdb.iastate.edu/.
- Dave
~~~~~~~~~~~~~
Date: Mon, 17 Apr 2000 14:37:51 -0500
From: xgai <xgai at iastate.edu>
Subject: Re: slow loading ACEDB
Dear Dr. Matthews:
Thank you for your quick response.
I checked the memory usage and xace uses only 8% of the memory. Our server
has 0.5 GB memory. As you noticed, the Sequence model that I built is a
very simple model, it is much less complicated than the model used by
ACeDB. I actually wondered how people who maintain the ACeDB did to load
the large amount of data. Their database is much larger and much more
complicated.
I tried to divide the big .ace file into several smaller files. I tried to
load the database by phases and quit, save the data and reopen it in
between. It does not solve the problem. The total number of the sequence
objects seems to be the magical number. I realized that it should take
longer and longer to update the index file when the index got much larger.
However, it is the precipitous drop of performance that puzzles me.
In fact, this time the sequences that I am trying to load do not have "EST"
as the keyword (I found it is not that useful anyway since all my sequences
are ESTs).
One more question: Do you know any good ACEDB documents that are more like
an ACEDB database manager's guide? It has been very frustrating for me to
try to figure a lot of things out and I can not find such a document for
reference on all of the acedb links or web sites that I can find.
I am really thankful that you spent time answering my questions. I greatly
appreciate it.
Best wishes,
At 03:01 PM 4/17/00 -0400, you wrote:
>Hi Xiaowu,
>>Interesting question. Looks like a good one for the ACEDB newsgroup.
>Lots of folks are building bigger databases nowadays, and no doubt starting
>to hit performance problems. I have two thoughts/suggestions:
>>1. machine
>>The symptoms sound like you're running out of RAM and starting to use the
>swap space (virtual memory). This would always slow things down hugely.
>Even worse if your swap space is on the same disk with database/block*wrm
>and/or the .ace file. You can monitor memory usage while the database is
>loading with "ps" and "vmstat". I like "top" even better; it's usually
>included with Linux.
>>I noticed this with GrainGenes when the number of Sequence records went
>past about 30K. On my old Sparc2 with 64 MB RAM it now takes over two hours
>to load the whole database. On a newer machine with 1 GB RAM (Intel Solaris)
>it's 4 minutes.
>>2. ACEDB
>>One thing ACEDB doesn't handle very well is records that have huge numbers
>of links from them, tens of thousands. I just looked at the Sequence model
>for the online ZmDB, and it looks like you've cleaned it of XREFs that
>might cause this kind of trouble. But I notice many of your Sequences have
>Keyword "EST". The ?Keyword class is usually automatically XREF'd; I don't
>know how to prevent it. Also, trying to query or browse ZmDB for Keyword
>EST, or 'find Sequence keyword=est', all fail.
>>Did you look in database/log.wrm for complaints? As you may know, this one
>is common:
>>2000-04-14_10:20:10 genome 16041 Class Text, object 93.09 has
>8729 > 3000 cells.
> This is just a warning, acedb has no hard limits on the mumber of cells
> per object, but the performances degrade on very large objects
> Either, you are cross referencing many entries into a single object,
> it may not be useful, and you could drop the XREF in the model and get the
> same info via an occasional query or, continually, via a subclass,
> or This object is Class:?Text, and you should rather use plain Text in
> the model
> or define a controlled vocabulary by giving an explicit list of tags
> This message is issued once per offending class and per code run
>>>I'd be interested to hear if you identify any smoking guns.
>>cheers,
>- Dave
>>> > From xgai at iastate.edu Mon Apr 17 10:56:30 2000
> > Subject: Please help
> >
> > I need your help. Our database, ZmDB, is an ACEDB database. I found an
> > interesting problem with ACEDB that confuses and frustrates me. When I used
> > the Ace parser to load a large .ace file, it first was flying (relatively)
> > and parsed thousands of records in a few minutes. However, after a while ,
> > it seemed reach to a critical point where it becomes terribly slow (about
> > one record per minute). I am very frustrated and puzzled. I can not find
> > anywhere in ACEDB documents that says anything about it? Can you shed some
> > lights on it? Thank you.
> >
> > Some details: we are running ACEDB on a linux box (RedHat5.2). The number
> > of records that I tried to load is about 50,000 of the same type sequence
> > data.
Xiaowu Gai
2104 Molecular Biology Building
Department of Zoology & Genetics
Iowa State University
Ames, IA 50010
Tel: (515)-2940022
---