Having been away for a week I see that there has been some discussion
about the ACeDB ?Map model definition. As I had to get a demonstration
ready recently for the mouse data I asked Richard for the new 2.0 map code.
This linked in well with the rest of the ACeDB code but of course used the
new (or a version of the new) model definitions.
Having spent some considerable time implementing models and code for the
cytogenetic map, the following comments are based on my own experience
and how I see my data fitting in with the new ?Map model definition.
The mouse cytogenetic data includes loci with locations derived mainly from
hybridisation in situ (HIS) and somatic cell (SC) techniques and chromosomal
anomalies (or rearrangements) with cytogenetic breakpoints.
It is good that the ?Map can be defined as being of type Genetic, Cytogenetic
or Physical. However for the cytogenetic map I feel the proposed definitions
for ?Locus and ?Map_location need some more thought.
> ?Locus Map ?Map XREF Locus #Map_location
> Main_Marker ?Map XREF Main_Marker
> Inside Inside_YAC ?YAC XREF Contains
> Inside_Fragment ?Fragment XREF Locus
> Chrom_Band ?Chrom_Band XREF Locus
> ?Map_location UNIQUE Position UNIQUE Float #map_error
> Multi_Position Float #map_error
> Ends Left UNIQUE Float #map_error
> Right UNIQUE Float #map_error
1. Presumably a Locus with a Chrom_Band tag defined will be referring to
a cytogenetic location with the data coming from either HIS or SC or
some other technique. How will this ?Locus definition handle genes
whose locations have been determined by several different techniques
and the results show different chromosome band(s)? Or even, what if
you have data from several experiments of the same type, eg HIS, and
the results differ?
Using the 2.0 models and code, I ended up defining separate objects of
class ?Interval for each locus location. Through appropriate tags and
cross referencing, each location object could be linked back to its
parent locus and each parent locus could list its cytogenetic location(s).
One advantage of this is that you can add to the object relevant comments
and references.
2. For purely semantic reasons I don't like the definition:
Chrom_Band ?Chrom_Band XREF Locus
If the location of a locus is uncertain, two bands may be given to
specify the range of bands within which the locus may be located,
eg Mouse_gene 12A1-A3 or Human_gene 1p35-p31. This definition will
list the two bands at the ends of the range, thereby implicitly
stating that the locus has two locations. I would prefer to see:
Chrom_Band ?Chrom_Band ?Chrom_Band
which to my mind says that the locus is located somewhere within the
range specified by the two bands. If of course, the locus does have
more than one location then they can still be listed.
3. Richard explained at the recent Boston ACeDB workshop that the new ?Map
would have fewer data types. Many of the current data types would be
composed of essentially two types - a Locus and an Interval. Does the
proposed model definition do away with the ?Interval class, or even
the concept? Maybe ?Map_location is replacing that functionality?
4. For chromosome anomalies, the ?Map_location may have to be modified to
include chromosome bands. Chromosome anomalies with two breakpoints on
the same chromosome can be treated as Intervals and placed on the
genetic and/or cytogenetic maps as appropriate, eg deletions, insertions,
transpositions, etc. (Chromosome anomalies have actually been defined
with their own class structure but have links to these Intervals where
appropriate. In this way complex anomalies involving several segments
can be defined.)
ie ?Map_location UNIQUE Position UNIQUE Float #map_error
Multi_Position Float #map_error
Ends Left_pos UNIQUE Float #map_error
Left_band ?Chrom_Band ?Chrom_Band
Right_pos UNIQUE Float #map_error
Right_band ?Chrom_Band ?Chrom_Band
Again the two Chrom_Bands are to specify the range if an end point
is uncertain. However a breakpoint may fall "Within" a band (range)
or at the "Junction" between two bands. So this is not adequate.
In fact to properly define anomaly breakpoints you must know the
anomaly, the anomaly type and which chromosome it falls on. Data may
be available for both cytogenetic locations and genetic positions. The
latter may include the loci with which the breakpoint was mapped by
linkage experiments. To handle all this data for a single breakpoint
I ended up defining a sub class called ?Anomaly_bkpt which could be
used by both the parent chromosome anomaly object and the Interval object.
The disadvantage of using a sub class for the breakpoint data is that
the data is not easily accessible.
5. Perhaps, Jean would like to consider the tags "Within" and "Junction".
Although "Within" and "Between" could be regarded as equivalent.
Something else to consider (which I have not) are the wonderfully precise,
yet imprecise definitions, such as
"near the proximal end of band xxx"!
6. In answer to John McCarthy's comment about "when will multiple boxes
refer to the same key?", it might be that an Interval of type anomaly
is defined both genetically and cytogenetically. It is one object but
would appear as two separate boxes on the same map - unless the
cytogenetic map is a completely separate display.
7. Continuing with cytogenetic definitions. One of the things that I have
done recently was to define some translocation breakpoints with both
genetic and cytogenetic locations as class ?Locus so that they could
appear as marker loci on the genetic map but with a line linking their
genetic position to the chromosome band(s) they were located within.
(Yes, this is a duplication of data as the translocations are also
defined as chromosome anomalies but I knew no other way.) Of course,
translocations have breakpoints on two chromosomes. The following does
have some relevance to the new models if paralogous genes with locations
on several different chromosomes are to eventually be considered.
The ?Locus definition (Map ?Map Float) handles several chromosomes very
well. However, because of the model tree structure, any ?Chrom_Band
definition MUST immediately follow ?Map if the right band on the right
chromosome on the right map is to be found for a particular locus
(eg CytoMap ?Map ?Chrom_Band ?Chrom_Band). Because of this, ?Map was
defined several times in the ?Locus model which I am not happy about.
Any suggestions? Ideally of course, it should be defined once and in
such a way that the routine in the gMapDisplay code which checks for a
valid chromosome and position only has to check in one place.
Is this relevant to Multi_Position? I am not sure what that is doing?
8. John McCarthy stated "What measurement units will be used for map
positions? It probably needs to be something relative, like percent
of the whole map, in order to support recursive nesting of maps."
That's fine but the map should be able to suport different measurements,
such as centiMorgans, kilobases, megabases and percent of total chromosome
length.
9. Yes, I support the use of MappingTags and I like the idea of "Containing"
heterogeneous data and other Maps or parts of Maps, if it can be done.
But I would like to see how all this works in practice.
Finally, one thing that does worry me, is that as time goes on more people
will be developing code to fulfill functions not already covered by the
main ACeDB code. With the main database in a constant state of flux (which
is not altogether a bad thing), this could mean a lot of work just to
maintain existing efforts, not to mention the reinventing the wheel syndrome.
I personally don't mind if ACeDB provides me with all the functionality
needed for my data. But I DO MIND if I spend time working on some aspect
of it only to find two months later that it has all been a complete and
utter waste of time and effort. Could we not have some coordination amongst
the developers?
Michelle Kirby
Email: kirbym at uk.ac.mrc.har-rbu