Future software directions

bailey at hmivax.humgen.upenn.edu bailey at hmivax.humgen.upenn.edu
Fri Feb 26 15:05:25 EST 1993


In article <1993Feb25.184958.1 at molbiol.ox.ac.uk>, kuzio at molbiol.ox.ac.uk writes:
> This isn't going to answer anybody's question on what hardware/software to
> buy or obtain, but what I'm wondering is:
>   How long before there is a *complete* gene analysis service offered
>   (commercial or otherwise)?
> Right now you can farm out vast sequencing projects to be done for you
> by a company.  Why not let another company run the data through all
> available analyses (i mean the gene finding, gene comparison, protein
> struc prediction, non-interactive stuff for starts).  If a hit to
> the databank is found, then things like phylogen analysis could
> be included.  The researcher deciphers the results.  Too far fetched?
> 
> Peter's point of knowing what you are doing is valid.  How many biologists
> have the time to become knowledgable on all software?

Rienhard Doelz' followup provides an accurate summary of economic arguments
against this; I'd like to add a few more theoretical opinions.

It's been my experience that the most biologically useful analysis of a
sequence requires a fair amount of biological input along the way.  One could
certainly hire a service to do all possible or potentially meaningful analyses
of a sequence, and then sort out what was biologically relevant afterwards. 
However, I expect the price necessary to support the resources which would be
necessary for every analysis would be prohibitive, especially given that, for
the moment, many analyses are going to turn up little of value.  (For instance,
even allowing for the probablity that many researchers aren't optimizing their
search strategies, how often do you see in a sequence report the comment, "a
search of [GenBank|EMBL|the database] revealed no significant homologies.")

Moreover, present techniques for computed sequence analysis are still pretty
susceptible to noise and bias, so the researcher analyzing the data should have
some awareness of what the algoritm that generated it is really doing, and what
its strengths and weaknesses are.  I'm not saying that avary geneticist needs
to be able to write his or her sequence analysis software from the ground up,
but that, just as one should know what the likely artifacts are when looking at
a sequencing gel, one should know what the likely artifacts are when looking at
a database search or a structure prediction.  The learning curve here doesn't
seem too steep, but in my experience many researchers resist learning these
things, beacuse they're 'too busy with wet science' to spend time on this
'ancillary stuff'.

Finally, I'd like to note that something similar to wht John Kuzio suggested
does seem to be happening, in that in many places biocomputing services are
being set up within university departments, or contact is slowly expanding
between 'bench' researchers and bioinformatics groups.  While this model does
duplicate many resources at each institution, it may represent a better
model than purely commercials services, since it provides more opportunity for
ongiong interaction between the (usually) biologically more knowledgeable
researcher and the (usually) computationally more knowledgeable informatics
person in the analysis of a sequence.

My $0.02.  Comments are always welcome, but constructive flames only, please.

					Charles Bailey

!-------------------------------------------------------------------------------
!             Dept. of Genetics / Howard Hughes Medical Institute
! University of Pennsylvania School of Medicine  Rm. 430 Clinical Research Bldg.
!     422 Curie Blvd.  Philadelphia, PA 19104 USA      Tel. (215) 898-1699
!          Internet: bailey at genetics.upenn.edu  (IN 128.91.200.37)
!-------------------------------------------------------------------------------




More information about the Bio-soft mailing list