Hello ,
I've recently joined the ACEDB community (about a month ago) so bear with me if
it seems that I'm making erroneous assumptions or outright mistakes.
Just a few comments prompted by Michelle Kirby's message of a couple of days ago
I'm currently working with Peter Little's group on chromosome 11 specifically to
look at the possibility of using acedb to display our data. The ideas that Peter
has had regarding how the data should be displayed are as follows :
1. As much of the data obtained is relational between genes, cosmid clones YACs
etc there is little or no chance of being able to map anything to specific
points on the chromosome, it is far simpler to map things with regard to
intervals along the chromosome. (i.e. everything is mapped relative to its
neighbours and not to pre-determind numerical positions.)
2. As more is found out about the order of loci/genes etc. along the chromosome
this data could be entered and remains fixed until further information is found
that leads to a change in the order of the loci/genes.
With the above in mind it would be nice if acedb had a display function that
could recognise such words as left of, fight of ,or between (yes, I don't use
proximal or distal too often), and create a map based solely on these positional
pointers rather than relying on an exact numerical location to be properly
positioned. This would require something akin to a linked-list (my c/c++
programming is practically nil so this term is used in the hope that it is correct), where as the data is read in to the display matrix the order in which data
is entered is not relevant but can be placed into the matrix at the required
point relative to its neighbours and is displayed, on an arbitrary scale where
distance data is not known, and at specific separations where map distances/
sequence length data is known.
This approach is a little more general to the one you propose but could probably
be more easily tailored to a wider variety of mapping problems with regard to
relative positions rather than point locations.
An example would be :
It is known that genes a-h map in interval 1-3, with a,b and c mapping to
interval 1-2 within interval 1-3, the order of a,b and c has been determined as
b a c and no further information is known about d-h.
The display Peter and I envisage would look something like the following :
1 -
| b
| a
| c
2 - d,e,f,g,h
|
|
|
|
|
|
3 -
If it were later found that d and e mapped closer to 3 than to 2 it would be
possible to break interval 2-3 down into two intervals called 2.1 and 2.2 with
d and e being mapped between 2.2 and 3 and f-h mapping between 2 and 2.1.
As can be seen this would eventually lead to smaller and smaller intervals until
one would eventually arrive at the actual sequence level of display to which a
particular gene or locus would map.
A tentative model may look something like :
?Interval Name text
Position Left_of ?Interval XREF Right_of #overlap
Right_of ?Interval XREF Left_of #overlap
Where overlap is a constructed type that indicates to what degree the intervals
overlap. (A rough percentage overlap would do).
Please feel free to contact me if there is anything you don't understand above
and I'll try and explain it better.
Benedict Arnold (email b.arnold at ic.ac.uk)