In article <3BDEA54C.FD71A337 at sanger.ac.uk>,
Richard Durbin <rd at sanger.ac.uk> wrote:
>We have talked for some time about ways to support an alternative exon
>implementation for acedb/fmap more in line with what is standard
>elsewhere, i.e. having separate exon objects, with transcripts linking
>to the exons that they contain. Time to have this discussion with
>proposals in the open in the acedb newsgroup, I think.
Sounds like a good idea to me!
>Should we also support Intron objects? If so, would either exons or introns be
>acceptable, and what happens if both are given and they are inconsistent?
Storing introns as objects seems a slightly strange concept to me - my
mental model is that introns are "gaps". Certainly technologies such as
BioJava are modelling transcripts as lists of exon objects, and that's
how we modelled them at Incyte too.
>I hasten to add, for reasons of continuity, and because some people strongly like
>all the exon structure information being explicit in the transcript-type objects,
>I expect we will continue to support the current style.
Well something like this in ?Sequence
Source_exons Int Int ?Exon
would be backward compaible with current databases. It works well in my
mind because it separates properties of the transcript (CDS etc, which
should indeed be properties of the ?Sequence object) from properties of
the exon itself (which probably aren't that numerous!).
I suppose you could add some XREFs so that exons could list which
transcripts they belong to, although as a rule I avoid throwing XREFs
around like confetti since they can drastically slow things down
(parsing large numbers of DNA_homol lines, anyone?!)
i.e. something like:
// in ?Sequence
Source_exons Int Int ?Exon XREF In_transcript
?Exon In_transcript ?Sequence
Sorry, might have the syntax wrong there - I don't have a copy of ACeDB
on my machine here at home.
Thoughts? My idea is pretty simple, but it certainly would solve my
needs, and maintains compatibility with current code.