indexing

Danielle et jean Thierry-Mieg mieg at kaa.crbm.cnrs-mop.fr
Fri Oct 17 06:04:17 EST 1997


this year, i have been expanding the indexing in acedb

works as folows

objects in acedb "http://probe.nalusda.gov:8000/acedocs"
are likje tree

example


Locus : "ace-1"
  Type    Gene    Reference_Allele    p1000
                  Phenotype   p1000 : class A acetylcholinesterase reduced 100%; no behavioral phenotype alone (ES1 
                                ME3) but ace-2;ace-1 is uncoordinated (hypercontracted) and ace-2;ace-3;ace-1 is L1 
                                lethal. NA7.
                              See also e1572
  Molecular_information   Sequence    EM:CERNAACE.1
                                      EM:CERNAACE.2
                                      F01G12.4
                                      W09B12.1
                                      EMBL:CERNAACE.1
                                      EMBL:CERNAACE.2
  Map     X   Position    24.1192     Error   0.009721
  Positive    Positive_clone      W09B12
  Mapping_data    2_point     ace-2; ace-3; ace-1/unc-3
                  Multi_point     195
                                  196
                  Pos_neg_data    mnDf4 does not delete ace-1.
                                  mnDp1 includes ace-1.
                                  mnDp8 includes ace-1.
                                  mnDp9 includes ace-1.
                                  mnDp25 does not include ace-1.
                                  mnDp27 includes ace-1.
                                  mnDf4 does not delete ace-1.
                                  mnDf41 deletes ace-1.
                                  mnDf42 deletes ace-1.
                                  mnDf8 does not delete ace-1.
                                  mnDf3 does not delete ace-1.
                                  mnDf1 deletes ace-1.
                                  mnDp1 includes ace-1.
                                  mnDp14 includes ace-1.
                                  mnDf41 deletes ace-1.
  Allele      e1572
              p1000
  Strain      ace-2(g72)I; ace-1(p1000)X.
             unc-3(p1001) ace-1(p1000)X.
  Reference   Isolation, characterization and epistasis of fluoride-resistant mutants of Caenorhabditis elegans.
              cDNA sequence, gene structure, and in vitro expression of ace-1, a gene encoding acetylcholinesterase of 
                class-A in the nematode Caenorhabditis elegans.
              Analysis of the 5' Transcriptional Regulatory Region of the ace-1 Gene in C. elegans.
              ACE-1, THE GENE ENCODING ACETYLCHOLINESTERASE OF CLASS A IN C. ELEGANS AND C. BRIGGSAE
              Characterization of a null mutation in ace-1, the gene encoding class A acetylcholinesterase in the 
       


decribed by a schema

before going to disk, a finger print of the object is taken and memeorized as a
bitset with following info:

presence of indexed-tags + boolean answer to filters

an indexed-tag is every tag of level 1,
in this case Type, Molecular_information,Map ,Positive ,Mapping_data ,Allele  ,Strain  ,Reference

plus any tag explicitelly declared as indexed in a configuration file, suppose tag Gene

a filter is a possibly complex query local to a single object
example: Filter COUNT Reference > 5

---------

then we define subclasses, as satisfying certain filetrs
memebers of subclasses are known without opening the disk
  these subclasses are more an automatic classification
 then the subclasses of other OO systems

furtehrmore, when, in any query, you refer to the presence
of a tag, i fth eobject is not yet open, rather than
opning it, we first search in the indices for the fisrt
indexed tag above it.

Suppose i search for

Find Locus where allele = e1572

(of course, i could rather search allele e1572 and backtrck directly
to its gene, but suppose there is no double cross referencing 
between gene an allele)

then i acn only find the answer among genes having at least one allele
and this is known from the index, so i will only explore a more limited
number of genes.

If i serach for Phenotype, i will look among those having tag Gene

those are the possible

----

to summarize the indexing can answer in time zero
  yes, no, may-be
and the may-be have to be accessed

the real difficulty would be to use the info of the filters
when presented with another arbitrary query

i regard this problem as too difficult.

If you are interested i can give more details






More information about the Acedb mailing list