More on Jean's models

kirbym at har-rbu.mrc.ac.uk kirbym at har-rbu.mrc.ac.uk
Tue Nov 16 11:44:10 EST 1993


Having been away for a week I see that there has been some discussion
about the ACeDB ?Map model definition. As I had to get a demonstration
ready recently for the mouse data I asked Richard for the new 2.0 map code.
This linked in well with the rest of the ACeDB code but of course used the
new (or a version of the new) model definitions.

Having spent some considerable time implementing models and code for the
cytogenetic map, the following comments are based on my own experience
and how I see my data fitting in with the new ?Map model definition.
The mouse cytogenetic data includes loci with locations derived mainly from
hybridisation in situ (HIS) and somatic cell (SC) techniques and chromosomal
anomalies (or rearrangements) with cytogenetic breakpoints.


It is good that the ?Map can be defined as being of type Genetic, Cytogenetic
or Physical. However for the cytogenetic map I feel the proposed definitions
for ?Locus and ?Map_location need some more thought.

   > ?Locus  Map ?Map XREF Locus #Map_location
   >         Main_Marker ?Map XREF Main_Marker
   >         Inside  Inside_YAC  ?YAC XREF Contains
   >                 Inside_Fragment ?Fragment XREF Locus
   >                 Chrom_Band ?Chrom_Band XREF Locus

   > ?Map_location UNIQUE Position UNIQUE Float #map_error
   >                      Multi_Position Float  #map_error
   >                      Ends Left UNIQUE Float  #map_error
   >                           Right UNIQUE Float  #map_error

1. Presumably a Locus with a Chrom_Band tag defined will be referring to
   a cytogenetic location with the data coming from either HIS or SC or
   some other technique. How will this ?Locus definition handle genes
   whose locations have been determined by several different techniques
   and the results show different chromosome band(s)? Or even, what if
   you have data from several experiments of the same type, eg HIS, and
   the results differ?

   Using the 2.0 models and code, I ended up defining separate objects of
   class ?Interval for each locus location. Through appropriate tags and
   cross referencing, each location object could be linked back to its
   parent locus and each parent locus could list its cytogenetic location(s).
   One advantage of this is that you can add to the object relevant comments
   and references.
   
2. For purely semantic reasons I don't like the definition:
              Chrom_Band ?Chrom_Band XREF Locus
   If the location of a locus is uncertain, two bands may be given to
   specify the range of bands within which the locus may be located,
   eg  Mouse_gene 12A1-A3 or Human_gene 1p35-p31. This definition will
   list the two bands at the ends of the range, thereby implicitly
   stating that the locus has two locations. I would prefer to see:
                       Chrom_Band ?Chrom_Band ?Chrom_Band
   which to my mind says that the locus is located somewhere within the
   range specified by the two bands. If of course, the locus does have
   more than one location then they can still be listed.

3. Richard explained at the recent Boston ACeDB workshop that the new ?Map
   would have fewer data types. Many of the current data types would be
   composed of essentially two types - a Locus and an Interval. Does the
   proposed model definition do away with the ?Interval class, or even
   the concept? Maybe ?Map_location is replacing that functionality?

4. For chromosome anomalies, the ?Map_location may have to be modified to
   include chromosome bands. Chromosome anomalies with two breakpoints on
   the same chromosome can be treated as Intervals and placed on the
   genetic and/or cytogenetic maps as appropriate, eg deletions, insertions,
   transpositions, etc. (Chromosome anomalies have actually been defined
   with their own class structure but have links to these Intervals where
   appropriate. In this way complex anomalies involving several segments
   can be defined.)

   ie  ?Map_location UNIQUE Position UNIQUE Float #map_error
                            Multi_Position Float #map_error
                            Ends Left_pos UNIQUE Float #map_error
                                 Left_band ?Chrom_Band ?Chrom_Band
                                 Right_pos UNIQUE Float #map_error
                                 Right_band ?Chrom_Band ?Chrom_Band

   Again the two Chrom_Bands are to specify the range if an end point
   is uncertain. However a breakpoint may fall "Within" a band (range)
   or at the "Junction" between two bands. So this is not adequate.

   In fact to properly define anomaly breakpoints you must know the
   anomaly, the anomaly type and which chromosome it falls on. Data may
   be available for both cytogenetic locations and genetic positions. The
   latter may include the loci with which the breakpoint was mapped by
   linkage experiments. To handle all this data for a single breakpoint
   I ended up defining a sub class called ?Anomaly_bkpt which could be
   used by both the parent chromosome anomaly object and the Interval object.

   The disadvantage of using a sub class for the breakpoint data is that
   the data is not easily accessible.

5. Perhaps, Jean would like to consider the tags "Within" and "Junction".
   Although "Within" and "Between" could be regarded as equivalent.
   Something else to consider (which I have not) are the wonderfully precise,
   yet imprecise definitions, such as 
         "near the proximal end of band xxx"!

6. In answer to John McCarthy's comment about "when will multiple boxes
   refer to the same key?", it might be that an Interval of type anomaly
   is defined both genetically and cytogenetically. It is one object but
   would appear as two separate boxes on the same map - unless the
   cytogenetic map is a completely separate display.

7. Continuing with cytogenetic definitions. One of the things that I have
   done recently was to define some translocation breakpoints with both
   genetic and cytogenetic locations as class ?Locus so that they could
   appear as marker loci on the genetic map but with a line linking their
   genetic position to the chromosome band(s) they were located within.
   (Yes, this is a duplication of data as the translocations are also
   defined as chromosome anomalies but I knew no other way.) Of course,
   translocations have breakpoints on two chromosomes.  The following does
   have some relevance to the new models if paralogous genes with locations
   on several different chromosomes are to eventually be considered.

   The ?Locus definition (Map ?Map Float) handles several chromosomes very
   well. However, because of the model tree structure, any ?Chrom_Band
   definition MUST immediately follow ?Map if the right band on the right
   chromosome on the right map is to be found for a particular locus
   (eg CytoMap ?Map ?Chrom_Band ?Chrom_Band). Because of this, ?Map was
   defined several times in the ?Locus model which I am not happy about.
   Any suggestions? Ideally of course, it should be defined once and in
   such a way that the routine in the gMapDisplay code which checks for a
   valid chromosome and position only has to check in one place.

   Is this relevant to Multi_Position? I am not sure what that is doing?

8. John McCarthy stated "What measurement units will be used for map
   positions? It probably needs to be something relative, like percent
   of the whole map, in order to support recursive nesting of maps."
   That's fine but the map should be able to suport different measurements,
   such as centiMorgans, kilobases, megabases and percent of total chromosome
   length.

9. Yes, I support the use of MappingTags and I like the idea of "Containing"
   heterogeneous data and other Maps or parts of Maps, if it can be done.
   But I would like to see how all this works in practice.

Finally, one thing that does worry me, is that as time goes on more people
will be developing code to fulfill functions not already covered by the
main ACeDB code. With the main database in a constant state of flux (which
is not altogether a bad thing), this could mean a lot of work just to
maintain existing efforts, not to mention the reinventing the wheel syndrome.
I personally don't mind if ACeDB provides me with all the functionality
needed for my data. But I DO MIND if I spend time working on some aspect
of it only to find two months later that it has all been a complete and
utter waste of time and effort. Could we not have some coordination amongst
the developers?

Michelle Kirby
Email: kirbym at uk.ac.mrc.har-rbu




More information about the Acedb mailing list