Dear all,
the following document written by Mark Blaxter relates to the future
development of computing resources to service the WHO Parasite Genome
Initiatives (including that for schisto). It is of potential relevance to
anyone who uses data generated by the schisto initiative (potentially
anyone doing molecular work on schisto or other platyhelminthes. We would
appreciate any comments that you might have on the proposals / options
outlined in the document.
Please reply to Mark Blaxter (mark.blaxter at ed.ac.uk)
**********************************************************************
Currently, we have a WHO TDR funded post [staffed by Martin Aslett], based
at the EBI, Hinxton, which provides a core support role in developing and
maintaining parasite genome databases, and keeping the parasite-genome
world wide web site available and up-to-date. We also have most of the
parasite-genome acedb databases on the www via HGMP at Hinxton. This post
is supplemented by individual parasite genome research labs which are
generating and analysing genome data, and initiating and carrying out
database and other informatics projects. The WHO grant for this work is up
for renewal this year and if we are to reapply for funding it is essential
that we have a clear strategy for the next three years.
As each genome project starts reaping is successes, and thus enters a
"second generation" of activity [with megabases to kilobases of genome
sequence each on the cards for the next years] our needs for genome
computing resources will soon outstrip current capacity.
Obviously it is possible for each project to go it alone and develop and
maintain expertise and resources independently. An alternative model is to
continue to co-operate [as we have done so far] and to pool scarce
resources in some way to yield better returns [services, outcomes] for all.
I can see three possible sectors where the different genome projects could
co-operate in genome bioinformatics:
1 a core world wide web site with on-line access to databases,
maintained at a single location but mirrored elsewhere
2 analysis of EST and genome sequence [the role of finishers and
analysts/annotators]
3 development and testing of new genome analysis tools geared to the
specific needs of the genomes [base composition, gene organisation] and
researchers [antigen identification, metabolic nets for drug target
identification, etc].
There are [at least] two ways these needs could be met: through a core
computing resource centre [all genomes analysed at one site] or by a
distributed network of bioinformatics workers. Both approaches have
benefits and costs/bugs - the proper route is most probably in-between
these extremes.
The core centre might offer benefits in:
1 a single site for access to all information
2 shared costs [e.g. of staff who might be only partially employed on
a single genome project]
3 critical mass [basing a number of staff at a single site will both
allow fruitful cross-fertilisation and attract better candidates] and
access to peer group
4 being more attractive to external funding agencies [a funding body
may be more likely to fund half a salary for one particular genome if the
staff member will be in a recognised centre]
The core might also be able to be based in a larger unit, such as the EBI
or NCBI, thus using the leverage of the other bioinformatics expertise to
advance parasitic interests.
The core centre idea also has disadvantages:
a the centre would have a geographical location which might be
unattractive to some funding agencies
b the centre would be based outside the labs where the data was being
generated [in many cases] and thus would be less well integrated with the
overall effort. This problem might be acute in the interpretation of data
with respect to the particular biology of the parasites.
c there may be issues of priority and co-ordination [If two cosmids
get sequenced, which gets analysed first? How do the genome projects ensure
that their priorities are recognised?]
The distributed network of bioinformatics might have the following benefits:
1 the informatics expertise would be based in the genome labs
allowing rapid transfer of concepts and data in both directions [biology ->
bioinformatics and vice versa]
2 appropriate for local funding, particularly in building up local
expertise in genomic computing
3 a greater variety of approaches would be tried and thus potential
pitfalls and problems identified more readily
Problems from the distributed model might be:
a the individual bioinformatics workers would be relatively isolated,
having to reinvent wheels already designed by others
b the lack of a clear peer group [e.g. a major bioinformatics centre]
might stifle innovation and application of advances
c costs might be higher [each centre requiring hardware, etc]
We am not currently biased towards either approach, but feel that it is
timely to discuss this. Can we devise a scheme which will give us most of
the benefits and little of the disadvantages?
Perhaps a sensible outcome would be to promote a concentration of
bioinformaticists in a core facility being achieved when and where possible
[e.g. with the Sanger Centre involvement in several projects it might make
sense to co-ordinate bioinformatics efforts there or at the EBI next
door?], but there being a distributed effort based in other sequencing
genome labs. We could have an annual gathering / workshop for all the
informaticists, so that some of the benefits of the global informal college
/ peer group / critical mass can be reaped. This might also be achieved
through the funding of a core resource staff who could train other
informaticists (say over a 6 month period) at the core site, before they
set themselves up in their home labs/countries. This revolving door would
both bring new ideas and resources into the projects and keep links between
projects alive.
One issue is of course funding. We can ask the WHO to continue to fund the
support post, and perhaps also other informatics posts per genome. These
funding requests could include in them funds for the six months rotation
idea. In Europe we could approach European or country-based funding bodies
to supplement these posts with full time positions at the Sanger/EBI [if
that was the chosen centre].
We would also appreciate any comments you have on the current
role/performance of the WHO genomes computing support.
Thanks
Mark Blaxter
***********************************************************
David A. Johnston,
Secretary to the WHO Schistosoma Genome Network,
Biomedical Parasitology Division,
Dept. of Zoology,
The Natural History Museum,
Cromwell Road, London SW7 5BD, England, UK.
Tel: 0171 9389297 (from outside the UK: 44 171 9389297)
Fax: 0171 9388754 (from outside the UK: 44 171 9388754)
eMail daj at nhm.ac.ukhttp://www.nhm.ac.uk/schisto
The Biomedical Parasitology Division is a WHO Collaborating Centre for the
identification of schistosomes and their snail hosts.