I've been noticing how fast the "follow" query command seems, relative to
querying on the value of a field. And that a query on any XREF'd field can
be framed either way. So I decided to run a quantitative comparison in
tace. See below. Result is that "follow" is a heck of a lot faster.
My question is, could the code be made to check whether a queried field is
XREF'd, and if so automatically use a "follow" approach?
Tests (both machines negligibly loaded with extraneous processes):
1.
On greengenes, a Sparc2, 64 MB RAM:
acedb> find probe
1 sec
// Found 8169 objects in this class
acedb> follow sequence
7 sec
// Found 1636 objects
acedb> query find sequence probe (64570 Sequence objects)
150 sec
// Found 1636 objects
acedb> query find sequence probe (Repeat, after caching.)
100 sec
// Found 1636 objects
2.
On probe.nalusda.gov, a more competent machine:
acedb> find probe
1 sec
// Found 8169 objects in this class
acedb> follow sequence
9 sec
// Found 1636 objects
acedb> query find sequence probe
35 sec
// Found 1636 objects
acedb> query find sequence probe
35 sec
// Found 1636 objects
In both cases wspec/cachesize.wrm says:
CACHE1 = 2000 // Size of first cache, as used in w5/blocksubs.c
CACHE2 = 2000 // Size of second cache, as used in w5/objcache.c
DISK = 10000 // Initial size of database on disk at creation
I don't know if these values are optimal. Would appreciate advice on that.
Regardless it seems like "follow' is a winner.
Does Aquila test query times on large datasets?
- Dave