edgrif at sanger.ac.uk
Wed Sep 6 09:13:05 EST 2000
A reply to Ewans points...
> (a) componentise heavily. ACeDB should become 4-5 separate projects,
> with their own release schedules and cvs modules, being something like:
> (i) database kernel
> (ii) bioobject layer
> (iii) graphics library
> (iv) fmap
> (v) other viewers
> I would take the "chain saw" approach for componentising these things,
> ie, having some pretty brutal tearing apart of the code, potentially
> copying large amounts of files so that projects can be separated
OK, there are several points here.....
1) large parts of acedb have been carved up into separately compiled libraries,
we started at the bottom and produced a library of utilities that just about all
of acedb makes use of. Next we attacked the graphics library and this is also
completely separate (its actually the basis of the Sanger Centres "image"
programs for gel analysis/display which have nothing to do with acedb).
This is not to say that there isn't a lot that could be done, see following
2) I think its worth adding the following split to the kernel:
kernel ----- models/class code (i.e. the bit that makes acedb what
it | is)
---- storage of objects (read/write to disk)
The kernel is not cleanly divided into routines that implement the models/class
like stuff that makes acedb what it is as a database, and the underlying code to
store/retrieve objects from disk. We have been thinking about this a lot
recently. It may make sense to allow the storage bit to be something else, e.g.
BerkleyDB, MySQL etc., but it doesn't make sense to throw out the models/class
stuff which is precisely what many people like and treasure about acedb.
If we did use a different backend it would enable us to add transactions,
multiple concurrent writing etc. and enable us to make use of a load of existing
3) bio-object layer - I'm not sure what is meant here, if it means "provide a
layer that serves up sequences, exons etc. in some standard format", then this
doesn't exist at the moment. To take the example of fmap, fmap in doing its
display does a whole of operations on the database to gather all the information
that it needs to do the display. The intelligence about what to do with
exon-like data etc. resides in fmap. fmap could be changed to split out bits of
code that prepared the sequence from the code that displayed it but this would
be an extremely non-trivial task and would require an excellent description of
what bio-objects were required to make this worthwhile (volunteers ?).
4) CVS stuff...
> The new projects, in my mind, would not be run via the classic ACeDB
> source code management system + makefile, but a more vanilla cvs with
> autoconf style configurations, lowering the problems of entry for new
> developers to work with you guys.
the acedb cvs stuff _is_ completely standard, we do use cover scripts which
issue cvs calls but this is to stop screw-ups in the way people create, destroy,
move, edit etc. files within the source tree. To access cvs directly people need
to contact us to get ssh access to the Sanger Centre, thats all.
The build is not standard, it was produced to allow fast/quick building on many
architectures simultaneously. We need to use autoconf badly and we've more or
less worked out how to get the best of both worlds. Autoconf is a must to help
users with install/compile on machines for which we don't provide binaries.
> >From the little that I know of ACeDB, I think the code perhaps reuses code
> *too much* at a fine granularity, which then causes problems in reusing
> large components.
No, this is not the problem. Code structure has been the main problem, acedb did
not have a clear enough separation between libraries and the applications using
them. It used #defines to munge the code into different behaviours for different
applications which completely blurred application and library. This has made
reuse difficult but is a situation that has improved enormously over the last 2
years, but it is still not possible to take fmap as a separate component for
> (b) plot a course for each component such that the end point is to be "one
> of the best open projects" in this area. Take note of what is out there
> already, and decide whether it is better to merge with other open
> source projects, discard projects or forge ahead. Here are some ideas:
> - make the database kernel the best open source XML database;
> reading/writing XML and XMLSchema; Bindings to perl/python/java; JDBC
> bindings if it is mappeable.
> - there are XML databases out there but I don't know if they are
> that good (the open source ones). Is XMLSchema good? Or established?
> Certainly DTD support would have to be put in.
xml output is on the way but I am still not sure what exactly we want from xml
output, we could quickly/cheaply do an xml version of an ace file which I guess
we will do, but what does this buy you apart from a warm, fuzzy feeling....
> - make the bio objects lean, mean tight objects able to sit on top of
> acedb or another database; able to coexist with other schemes well
hmmmm, the thought of fmap on top of Oracle, warms the cockles of your heart, or
something like that...
> - there is EMBOSS, an established C bio-objects system. There are
> the bioperl/biopython/biojava projects.
OK, I don't know about EMBOSS, someone else could comment on that. biojava
already has sophisticated links to acedb, biopython is on the way, aceperl
provides access to acedb but I don't know what the links between bioperl and
aceperl are (over to you Ewan and Lincoln).
> I would hate to use ACEDB at the moment, but that doesn't mean I can't see
> the potential for many aspects of the software. I believe that it is time
> for your guys to be bold and go forward. I have no doubt that this is
> going to be difficult in many areas; some tough decisions await you guys,
> but it is better to make bold moves than none at all.
Well I guess that acedb does have some difficult decisions coming up but its
worth noting that many criticisms of acedb centre around:
1) It won't scale
2) It doesn't have concurrent write access
3) It doesn't support transactions
4) It doesn't run on Macs
5) The docs are not good enough
i.e. a totally different set of problems from the ones identified by Ewan.
| Ed Griffiths, Acedb development, Informatics Group, |
| The Sanger Centre, Wellcome Trust Genome Campus, |
| Hinxton, Cambridge CB10 1SA, UK |
| email: edgrif at sanger.ac.uk URL: http://www.sanger.ac.uk/Users/edgrif |
| Tel: +44-1223-494780 Fax: +44 1223 494919 |
More information about the Acedb