performance of field-specified query vs. "follow"

Dave Matthews matthews at GREENGENES.CIT.CORNELL.EDU
Mon Jul 27 21:22:08 EST 1998


I've been noticing how fast the "follow" query command seems, relative to
querying on the value of a field.  And that a query on any XREF'd field can
be framed either way.  So I decided to run a quantitative comparison in
tace.  See below.  Result is that "follow" is a heck of a lot faster.

My question is, could the code be made to check whether a queried field is
XREF'd, and if so automatically use a "follow" approach?


Tests (both machines negligibly loaded with extraneous processes):

1.
On greengenes, a Sparc2, 64 MB RAM:

acedb> find probe
                                       1 sec
// Found 8169 objects in this class

acedb> follow sequence
                                       7 sec
// Found 1636 objects

acedb> query find sequence probe     (64570 Sequence objects)
                                     150 sec
// Found 1636 objects

acedb> query find sequence probe     (Repeat, after caching.)
                                     100 sec
// Found 1636 objects


2.
On probe.nalusda.gov, a more competent machine:

acedb> find probe
                                      1 sec
// Found 8169 objects in this class

acedb> follow sequence
                                      9 sec
// Found 1636 objects

acedb> query find sequence probe
                                     35 sec
// Found 1636 objects

acedb> query find sequence probe
                                     35 sec
// Found 1636 objects


In both cases wspec/cachesize.wrm says:

CACHE1 = 2000     // Size of first cache, as used in w5/blocksubs.c
CACHE2 = 2000     // Size of second cache, as used in w5/objcache.c
DISK =  10000     // Initial size of database on disk at creation

I don't know if these values are optimal.  Would appreciate advice on that.
Regardless it seems like "follow' is a winner.

Does Aquila test query times on large datasets?

- Dave




More information about the Acedb mailing list