I would like to pitch in some information about my own experience with
rather large (from my viewpoint) Ace-dbases. I maintain, among other
things, a local version of the human BAC ends collection. These are
between 700 and 800 thousand sequence objects, each linking back to a
BAC-parent, (just over 400 thousand BACs). The whole thing is taking
almost 2 Gb on disk but did, admittedly, take a while to load from the
500 Mb Acefile.
My hardware is a 450 MHz PIII with around 500 Mb RAM. When querying
the thing, it performs quite well and only slows down when I do
something silly like querying ALL the Sequence-objects by some of their
tags, so all access-methods (AcePerl mostly) are designed to NOT make
that sort of queries! My way around that sort of problem, like when
looking for a sequence with a particular GenBank-accession, is to make a
model and GenBank_accession-object that links back to the sequence
itself, elimating the need to ask check and every sequence for its
accession. Quite similar to indexing a column in a relational table.
But, this may be old news to veteran Ace-ers :I
So, I hope this helps with the hardware/performance-discussion. I
will soon expand this setup to incorporate other BAC-data (ePCR,
hybridizations etc) which will increase my cross-reference count great
deal. I´ll keep you posted on how it affects my performancs.
And one more thing; I came across some documentation on a
benchmarking utility called Aquila. Is that still supported and used? I
had trouble finding the software to download. Can someone help me on
that one? Thanks,
Mummi, Iceland
In article <200004180057.UAA06408 at greengenes.cit.cornell.edu>,
matthews at greengenes.cit.cornell.edu (Dave Matthews) wrote:
> Apologies for forwarding a raw dialog, but there are several
interesting
> issues scattered in it. ZmDB is at www.zmdb.iastate.edu/.
> - Dave
>> ~~~~~~~~~~~~~
> Date: Mon, 17 Apr 2000 14:37:51 -0500
> From: xgai <xgai at iastate.edu>
> Subject: Re: slow loading ACEDB
>> Dear Dr. Matthews:
>> Thank you for your quick response.
>> I checked the memory usage and xace uses only 8% of the memory. Our
server
> has 0.5 GB memory. As you noticed, the Sequence model that I built is
a
> very simple model, it is much less complicated than the model used by
> ACeDB. I actually wondered how people who maintain the ACeDB did to
load
> the large amount of data. Their database is much larger and much more
> complicated.
>> I tried to divide the big .ace file into several smaller files. I
tried to
> load the database by phases and quit, save the data and reopen it in
> between. It does not solve the problem. The total number of the
sequence
> objects seems to be the magical number. I realized that it should take
> longer and longer to update the index file when the index got much
larger.
> However, it is the precipitous drop of performance that puzzles me.
>> In fact, this time the sequences that I am trying to load do not have
"EST"
> as the keyword (I found it is not that useful anyway since all my
sequences
> are ESTs).
>> One more question: Do you know any good ACEDB documents that are more
like
> an ACEDB database manager's guide? It has been very frustrating for me
to
> try to figure a lot of things out and I can not find such a document
for
> reference on all of the acedb links or web sites that I can find.
>> I am really thankful that you spent time answering my questions. I
greatly
> appreciate it.
>> Best wishes,
>> At 03:01 PM 4/17/00 -0400, you wrote:
> >Hi Xiaowu,
> >
> >Interesting question. Looks like a good one for the ACEDB newsgroup.
> >Lots of folks are building bigger databases nowadays, and no doubt
starting
> >to hit performance problems. I have two thoughts/suggestions:
> >
> >1. machine
> >
> >The symptoms sound like you're running out of RAM and starting to use
the
> >swap space (virtual memory). This would always slow things down
hugely.
> >Even worse if your swap space is on the same disk with
database/block*wrm
> >and/or the .ace file. You can monitor memory usage while the
database is
> >loading with "ps" and "vmstat". I like "top" even better; it's
usually
> >included with Linux.
> >
> >I noticed this with GrainGenes when the number of Sequence records
went
> >past about 30K. On my old Sparc2 with 64 MB RAM it now takes over
two hours
> >to load the whole database. On a newer machine with 1 GB RAM (Intel
Solaris)
> >it's 4 minutes.
> >
> >2. ACEDB
> >
> >One thing ACEDB doesn't handle very well is records that have huge
numbers
> >of links from them, tens of thousands. I just looked at the Sequence
model
> >for the online ZmDB, and it looks like you've cleaned it of XREFs
that
> >might cause this kind of trouble. But I notice many of your
Sequences have
> >Keyword "EST". The ?Keyword class is usually automatically XREF'd; I
don't
> >know how to prevent it. Also, trying to query or browse ZmDB for
Keyword
> >EST, or 'find Sequence keyword=est', all fail.
> >
> >Did you look in database/log.wrm for complaints? As you may know,
this one
> >is common:
> >
> >2000-04-14_10:20:10 genome 16041 Class Text, object 93.09 has
> >8729 > 3000 cells.
> > This is just a warning, acedb has no hard limits on the mumber of
cells
> > per object, but the performances degrade on very large objects
> > Either, you are cross referencing many entries into a single
object,
> > it may not be useful, and you could drop the XREF in the model and
get the
> > same info via an occasional query or, continually, via a subclass,
> > or This object is Class:?Text, and you should rather use plain Text
in
> > the model
> > or define a controlled vocabulary by giving an explicit list of
tags
> > This message is issued once per offending class and per code run
> >
> >
> >I'd be interested to hear if you identify any smoking guns.
> >
> >cheers,
> >- Dave
> >
> >
> > > From xgai at iastate.edu Mon Apr 17 10:56:30 2000
> > > Subject: Please help
> > >
> > > I need your help. Our database, ZmDB, is an ACEDB database. I
found an
> > > interesting problem with ACEDB that confuses and frustrates me.
When I used
> > > the Ace parser to load a large .ace file, it first was flying
(relatively)
> > > and parsed thousands of records in a few minutes. However, after a
while ,
> > > it seemed reach to a critical point where it becomes terribly slow
(about
> > > one record per minute). I am very frustrated and puzzled. I can
not find
> > > anywhere in ACEDB documents that says anything about it? Can you
shed some
> > > lights on it? Thank you.
> > >
> > > Some details: we are running ACEDB on a linux box (RedHat5.2). The
number
> > > of records that I tried to load is about 50,000 of the same type
sequence
> > > data.
>> Xiaowu Gai
> 2104 Molecular Biology Building
> Department of Zoology & Genetics
> Iowa State University
> Ames, IA 50010
> Tel: (515)-2940022
> ---
>>
Sent via Deja.com http://www.deja.com/
Before you buy.