performance problems in loading
zgudmunt at my-deja.com
zgudmunt at my-deja.com
Tue Apr 18 09:20:36 EST 2000
I would like to pitch in some information about my own experience with
rather large (from my viewpoint) Ace-dbases. I maintain, among other
things, a local version of the human BAC ends collection. These are
between 700 and 800 thousand sequence objects, each linking back to a
BAC-parent, (just over 400 thousand BACs). The whole thing is taking
almost 2 Gb on disk but did, admittedly, take a while to load from the
500 Mb Acefile.
My hardware is a 450 MHz PIII with around 500 Mb RAM. When querying
the thing, it performs quite well and only slows down when I do
something silly like querying ALL the Sequence-objects by some of their
tags, so all access-methods (AcePerl mostly) are designed to NOT make
that sort of queries! My way around that sort of problem, like when
looking for a sequence with a particular GenBank-accession, is to make a
model and GenBank_accession-object that links back to the sequence
itself, elimating the need to ask check and every sequence for its
accession. Quite similar to indexing a column in a relational table.
But, this may be old news to veteran Ace-ers :I
So, I hope this helps with the hardware/performance-discussion. I
will soon expand this setup to incorporate other BAC-data (ePCR,
hybridizations etc) which will increase my cross-reference count great
deal. I´ll keep you posted on how it affects my performancs.
And one more thing; I came across some documentation on a
benchmarking utility called Aquila. Is that still supported and used? I
had trouble finding the software to download. Can someone help me on
that one? Thanks,
In article <200004180057.UAA06408 at greengenes.cit.cornell.edu>,
matthews at greengenes.cit.cornell.edu (Dave Matthews) wrote:
> Apologies for forwarding a raw dialog, but there are several
> issues scattered in it. ZmDB is at www.zmdb.iastate.edu/.
> - Dave
> Date: Mon, 17 Apr 2000 14:37:51 -0500
> From: xgai <xgai at iastate.edu>
> Subject: Re: slow loading ACEDB
> Dear Dr. Matthews:
> Thank you for your quick response.
> I checked the memory usage and xace uses only 8% of the memory. Our
> has 0.5 GB memory. As you noticed, the Sequence model that I built is
> very simple model, it is much less complicated than the model used by
> ACeDB. I actually wondered how people who maintain the ACeDB did to
> the large amount of data. Their database is much larger and much more
> I tried to divide the big .ace file into several smaller files. I
> load the database by phases and quit, save the data and reopen it in
> between. It does not solve the problem. The total number of the
> objects seems to be the magical number. I realized that it should take
> longer and longer to update the index file when the index got much
> However, it is the precipitous drop of performance that puzzles me.
> In fact, this time the sequences that I am trying to load do not have
> as the keyword (I found it is not that useful anyway since all my
> are ESTs).
> One more question: Do you know any good ACEDB documents that are more
> an ACEDB database manager's guide? It has been very frustrating for me
> try to figure a lot of things out and I can not find such a document
> reference on all of the acedb links or web sites that I can find.
> I am really thankful that you spent time answering my questions. I
> appreciate it.
> Best wishes,
> At 03:01 PM 4/17/00 -0400, you wrote:
> >Hi Xiaowu,
> >Interesting question. Looks like a good one for the ACEDB newsgroup.
> >Lots of folks are building bigger databases nowadays, and no doubt
> >to hit performance problems. I have two thoughts/suggestions:
> >1. machine
> >The symptoms sound like you're running out of RAM and starting to use
> >swap space (virtual memory). This would always slow things down
> >Even worse if your swap space is on the same disk with
> >and/or the .ace file. You can monitor memory usage while the
> >loading with "ps" and "vmstat". I like "top" even better; it's
> >included with Linux.
> >I noticed this with GrainGenes when the number of Sequence records
> >past about 30K. On my old Sparc2 with 64 MB RAM it now takes over
> >to load the whole database. On a newer machine with 1 GB RAM (Intel
> >it's 4 minutes.
> >2. ACEDB
> >One thing ACEDB doesn't handle very well is records that have huge
> >of links from them, tens of thousands. I just looked at the Sequence
> >for the online ZmDB, and it looks like you've cleaned it of XREFs
> >might cause this kind of trouble. But I notice many of your
> >Keyword "EST". The ?Keyword class is usually automatically XREF'd; I
> >know how to prevent it. Also, trying to query or browse ZmDB for
> >EST, or 'find Sequence keyword=est', all fail.
> >Did you look in database/log.wrm for complaints? As you may know,
> >is common:
> >2000-04-14_10:20:10 genome 16041 Class Text, object 93.09 has
> >8729 > 3000 cells.
> > This is just a warning, acedb has no hard limits on the mumber of
> > per object, but the performances degrade on very large objects
> > Either, you are cross referencing many entries into a single
> > it may not be useful, and you could drop the XREF in the model and
> > same info via an occasional query or, continually, via a subclass,
> > or This object is Class:?Text, and you should rather use plain Text
> > the model
> > or define a controlled vocabulary by giving an explicit list of
> > This message is issued once per offending class and per code run
> >I'd be interested to hear if you identify any smoking guns.
> >- Dave
> > > From xgai at iastate.edu Mon Apr 17 10:56:30 2000
> > > Subject: Please help
> > >
> > > I need your help. Our database, ZmDB, is an ACEDB database. I
> > > interesting problem with ACEDB that confuses and frustrates me.
When I used
> > > the Ace parser to load a large .ace file, it first was flying
> > > and parsed thousands of records in a few minutes. However, after a
> > > it seemed reach to a critical point where it becomes terribly slow
> > > one record per minute). I am very frustrated and puzzled. I can
> > > anywhere in ACEDB documents that says anything about it? Can you
> > > lights on it? Thank you.
> > >
> > > Some details: we are running ACEDB on a linux box (RedHat5.2). The
> > > of records that I tried to load is about 50,000 of the same type
> > > data.
> Xiaowu Gai
> 2104 Molecular Biology Building
> Department of Zoology & Genetics
> Iowa State University
> Ames, IA 50010
> Tel: (515)-2940022
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Acedb