Computing support for WHO's Parasite Genome Projects

David Johnston daj at
Tue Apr 7 10:18:55 EST 1998

 Dear all,
 the following document written by Mark Blaxter relates to the future
 development of computing resources to service the WHO Parasite Genome
 Initiatives (including that for schisto). It is of potential relevance to
 anyone who uses data generated by the schisto initiative (potentially
 anyone doing molecular work on schisto or other platyhelminthes. We would
 appreciate any comments that you might have on the proposals / options
 outlined in the document.
 Please reply to Mark Blaxter (mark.blaxter at
 Currently, we have a WHO TDR funded post [staffed by Martin Aslett], based
 at the EBI, Hinxton, which provides a core support role in developing and
 maintaining parasite genome databases, and keeping the parasite-genome
 world wide web site available and up-to-date. We also have most of the
 parasite-genome acedb databases on the www via HGMP at Hinxton. This post
 is supplemented by individual parasite genome research labs which are
 generating and analysing genome data, and initiating and carrying out
 database and other informatics projects. The WHO grant for this work is up
 for renewal this year and if we are to reapply for funding it is essential
 that we have a clear strategy for the next three years.
 As each genome project starts reaping is successes, and thus enters a
 "second generation" of activity [with megabases to kilobases of genome
 sequence each on the cards for the next years] our needs for genome
 computing resources will soon outstrip current capacity.
 Obviously it is possible for each project to go it alone and develop and
 maintain expertise and resources independently. An alternative model is to
 continue to co-operate [as we have done so far] and to pool scarce
 resources in some way to yield better returns [services, outcomes] for all.
 I can see three possible sectors where the different genome projects could
 co-operate in genome bioinformatics:
 1	a core world wide web site with on-line access to databases,
 maintained at a single location but mirrored elsewhere
 2	analysis of EST and genome sequence [the role of finishers and
 3	development and testing of new genome analysis tools geared to the
 specific needs of the genomes [base composition, gene organisation] and
 researchers [antigen identification, metabolic nets for drug target
 identification, etc].
 There are [at least] two ways these needs could be met: through a core
 computing resource centre [all genomes analysed at one site] or by a
 distributed network of bioinformatics workers. Both approaches have
 benefits and costs/bugs - the proper route is most probably in-between
 these extremes.
 The core centre might offer benefits in:
 1	a single site for access to all information
 2	shared costs [e.g. of staff who might be only partially employed on
 a single genome project]
 3	critical mass [basing a number of staff at a single site will both
 allow fruitful cross-fertilisation and attract better candidates] and
 access to peer group
 4	being more attractive to external funding agencies [a funding body
 may be more likely to fund half a salary for one particular genome if the
 staff member will be in a recognised centre]
 The core might also be able to be based in a larger unit, such as the EBI
 or NCBI, thus using the leverage of the other bioinformatics expertise to
 advance parasitic interests.
 The core centre idea also has disadvantages:
 a	the centre would have a geographical location which might be
 unattractive to some funding agencies
 b	the centre would be based outside the labs where the data was being
 generated [in many cases] and thus would be less well integrated with the
 overall effort. This problem might be acute in the interpretation of data
 with respect to the particular biology of the parasites.
 c	there may be issues of priority and co-ordination [If two cosmids
 get sequenced, which gets analysed first? How do the genome projects ensure
 that their priorities are recognised?]
 The distributed network of bioinformatics might have the following benefits:
 1	the informatics expertise would be based in the genome labs
 allowing rapid transfer of concepts and data in both directions [biology ->
 bioinformatics and vice versa]
 2	appropriate for local funding, particularly in building up local
 expertise in genomic computing
 3	a greater variety of approaches would be tried and thus potential
 pitfalls and problems identified more readily
 Problems from the distributed model might be:
 a	the individual bioinformatics workers would be relatively isolated,
 having to reinvent wheels already designed by others
 b	the lack of a clear peer group [e.g. a major bioinformatics centre]
 might stifle innovation and application of advances
 c	costs might be higher [each centre requiring hardware, etc]
 We am not currently biased towards either approach, but feel that it is
 timely to discuss this. Can we devise a scheme which will give us most of
 the benefits and little of the disadvantages?
 Perhaps a sensible outcome would be to promote a concentration of
 bioinformaticists in a core facility being achieved when and where possible
 [e.g. with the Sanger Centre involvement in several projects it might make
 sense to co-ordinate bioinformatics efforts there or at the EBI next
 door?], but there being a distributed effort based in other sequencing
 genome labs. We could have an annual gathering / workshop for all the
 informaticists, so that some of the benefits of the global informal college
 / peer group / critical mass can be reaped. This might also be achieved
 through the funding of a core resource staff who could train other
 informaticists (say over a 6 month period) at the core site, before they
 set themselves up in their home labs/countries. This revolving door would
 both bring new ideas and resources into the projects and keep links between
 projects alive.
 One issue is of course funding. We can ask the WHO to continue to fund the
 support post, and perhaps also other informatics posts per genome. These
 funding requests could include in them funds for the six months rotation
 idea. In Europe we could approach European or country-based funding bodies
 to supplement these posts with full time positions at the Sanger/EBI [if
 that was the chosen centre].
 We would also appreciate any comments you have on the current
 role/performance of the WHO genomes computing support.
 Mark Blaxter
 David A. Johnston,
 Secretary to the WHO Schistosoma Genome Network,
 Biomedical Parasitology Division,
 Dept. of Zoology,
 The Natural History Museum,
 Cromwell Road, London SW7 5BD, England, UK.
 Tel: 0171 9389297 (from outside the UK: 44 171 9389297)
 Fax: 0171 9388754 (from outside the UK: 44 171 9388754)
 eMail daj at
 The  Biomedical Parasitology Division is a WHO Collaborating Centre for the
 identification of schistosomes and their snail hosts.

More information about the Schisto mailing list