[Bio-software] Discovery Informatics Forum 2006: Meet the Informatics Solution Providers!

Jon Rees jon.rees at oxfordshirebioscience.net
Wed Oct 4 05:09:55 EST 2006


Dear Colleagues,

You are invited to register and reserve your place online: 
http://www.discoveryforum.net

Spend a day meeting industry experts at the highest level including 
chief executives and directors of leading discovery informatics solution 
providers, in the centre of London, UK.

It’s free to attend for the biotech industry as well as academia. The 
1st annual Discovery Informatics Forum attracted over 130 delegates last 
year. The meeting is organised by (not-for-profit) UK Bioinformatics 
Forum, an ISCB-Associated Regional Group.

Discovery Informatics Forum 2006 will include keynotes from Philippe 
Sanseau, who is currently WW Director Bioinformatics Discovery and 
Analysis, GlaxoSmithKline, with a tantalising talk entitled: "What 
Pharma Wants from Informatics Solution Providers".

Areas of focus: Drug discovery informatics, data integration and data 
management, high throughput – high content, text-mining

Talk synopses have been accepted from BioWisdom, InforSense, Ingenuity 
Systems, Tessella (joint with GSK), Equinox, TransInsight , Genologics, 
Linguamatics, ImageInterpret and GeKnown.

Where and When?
===============

location: DTI Conference Centre, 1 Victoria Street, London, SW1 0ET
date: 21 November 2006
time: 8:45 am to 5:00 pm


The Talks:
==========

"Semantic Data Integration for the whole Pharmaceutical Business", Dr 
Steve Gardner, CTO, BioWisdom Ltd
"Statistical Quality Control Throughout the High Throughput Screening 
Process", Dr Ian Bolland, Software Engineer, Tessella and Liz Clark, 
Manager, Screening and Compound Profiling, GSK
"Logic-based Drug Discovery", Prof. Mike Sternberg, Director, Equinox 
Pharma Ltd 2
"Partnering with industry to generate useful and useable tools: GAPP as 
a case study.", Ian Shadforth, Director, GeKnown Solutions
"Ingenuity Pathways Analysis 4.0: Workflows allowing Biomarker 
Discovery", Adam S Corner, Field Application Scientist, Ingenuity Systems
“Flexible High-Content Analysis based on Image Mining – Automatic Image 
Analysis and Image Interpretation of cell pattern”, Horst Perner, CEO, 
ImageInterpret GmbH
"Flexible Text Mining for Knowledge Exploration in the Life Sciences", 
David Milward, Chief Technology Officer, Linguamatics
"Knowledge based search - towards answering questions", Dr. Michael R. 
Alvers, Prof. Dr. Michael Schroeder, TransInsight
"Capturing experimental data in Translational Medicine & System 
Biology", Kevin Jones, European Business Development Manager, GenoLogics

Talk Synopses:
==============

"Logic-based Drug Discovery", Prof. Mike Sternberg, Director, Equinox 
Pharma Ltd
Quantitative Structure Activity Relationship (QSAR) is widely used for 
the identification of the key features conferring the activity of the 
molecule and to locate structural alerts of toxicity and mutagenicity. 
Over the last few years we have been developing a logic-based approach 
to derive QSARs that overcome several of the problems associated with 
many other QSAR methods including the need for molecular superposition, 
problems of handling diverse data sets and the lack of chemical insights 
into the learnt QSARs. The most recent development is support vector 
inductive logic programming (SVILP) that includes quantitative modelling 
in addition to the yes-no results from the earlier logic-based approach.
We applied this method to deduce rules which govern binding affinity for 
a series of ligands to a particular protein and then used these rules to 
predict the affinity of novel ligands. SVILP yields far more accurate 
prediction of affinity in leave-one-out stdies than industry standard 
approaches based on scoring using Gold Score or Drug Score. In another 
study we model toxicity using SVILP and obtain significant improvements 
over alternative approaches.

"Partnering with industry to generate useful and useable tools: GAPP as 
a case study.", Ian Shadforth, Director, GeKnown Solutions
The Genome Annotating Proteomic Pipeline (GAPP) has been borne out of a 
close collaboration of over four years between GlaxoSmithKline and 
Cranfield University. The system is now being marketed by GeKnown 
Solutions, a new spin-out from the university. This integrative approach 
to development has led to a tool that met GSK’s initial needs and will 
help to ensure that this system maintains the key attributes of 
usefulness and usability in the future.
What does GAPP do? Proteomics based on tandem mass spectrometry is a 
powerful tool for identifying novel biomarkers and drug targets. The 
output from many current peptide identification algorithms often 
requires human intervention to make calls regarding the presence or 
absence of peptides. Furthermore, such tools promote a piecemeal 
approach to proteomic data analysis, where data is not automatically 
analysed in the context of other data available from the same 
experiment, or other experiments. The GAPP system addresses these 
challenges, by providing a totally automated protein identification 
pipeline with a false positive rate of close to zero, coupled with 
company-wide, useable and complete access to the resultant databank. 
This allows the protein data gathered to be used in ways other than 
those for which it was originally generated, thus further supporting the 
drug discovery pipeline.


"Ingenuity Pathways Analysis 4.0: Workflows allowing Biomarker 
Discovery", Adam S Corner, Field Application Scientist, Ingenuity Systems
Synopsis: There is a significant unmet need for discovery and validation 
of biomarkers. Here we have analysed publicly available gene expression 
and proteomic signatures of rheumatoid arthritis using Ingenuity 
Pathways Analysis (IPA). Genomics and proteomics approaches to biomarker 
discovery are widely employed in the field of rheumatoid arthritis, yet 
are faced with 2 fundamental challenges: anchoring molecular profiles to 
phenotypes associated with the disease, and integration of data from 
multiple experimental platforms and different preclinical and clinical 
models of disease. Consequently gene-to-gene comparisons of gene/protein 
profiles generated from different studies or different experimental 
platforms often show little overlap. However, placing these profiles in 
the context of pathways, biological functions, and regulatory networks 
from IPA reveals much greater biological consistency between profiles. 
Analysis of these gene expression and protein signatures in IPA provided 
an understanding of which biomarker candidates were functionally 
relevant to rheumatoid arthritis and related biological processes. The 
ability in IPA to clarify the subcellular localisation and tissue 
expression patterns of candidate genes participating in those functions 
and pathways enables identification of a workable number of genes to 
follow up on as potential biomarker candidates.

"Capturing experimental data in Translational Medicine & System 
Biology", Kevin Jones, European Business Development Manager, GenoLogics

Although great strides are continually being made in improving data 
analysis software, visualization tools, algorithms and mining of 
existing knowledge, a better understanding of biology still requires 
scientific experimentation.

In 'omics' these experiments are adapting to new discoveries being made 
by becoming larger (to improve their statistical power), more complex 
(with more and more analytical steps, novel methods and hybrid 
instrumentation), multidisciplinary (to help validate the experimental 
findings) and incorporate more and more diverse software tools (to help 
progress the experiment).

Our goal at GenoLogics is to develop experiment, workflow and data 
management software to help cut through the difficulties encountered by 
laboratories trying to cope with the demands described above in the 
fields of proteomics, genomics, metabonomics and other emerging fields, 
leading to better productivity and effectiveness, as well as to improved 
data quality and provenance. Examples will be given of the some of 
challenges we have overcome in a wide range of proteomics and genomics 
facilities in North America and Europe and how we are handling 
experiments involving multiple 'omics' for clinical and translational 
medicine projects.

"How Workflows Cut the Cost of High-Volume Data Analysis" Anthony Rowe, 
Product Manager, InforSense
Synopsis to be supplied
"Semantic Data Integration for the whole Pharmaceutical Business", Dr 
Steve Gardner, CTO, BioWisdom Ltd
Pharmaceutical companies have issues with a lack of information 
transparency at all stages of the business. R&D activities such as 
target validation, candidate selection and lead optimization as well as 
wider business issues such as biomarker identification, product 
differentiation and the evaluation of in-licensing opportunities or 
potential toxicity liabilities all require correlation of complex, 
multi-disciplinary information. With more traditional integration 
technologies that has proved difficult to deliver, but a new generation 
of semantic technologies is delivering transparent integration of 
biological, chemical, textual and business information. This talk will 
discuss these issues and demonstrate how some of these issues have been 
solved in real-world case studies.

"Statistical Quality Control Throughout the High Throughput Screening 
Process", Dr Ian Bolland, Software Engineer, Tessella and Dr. Isabel 
Coma, Principal Scientist, Screening and Compound Profiling, GSK
Monitoring and analyzing the quality of data in High Throughput 
Screening is becoming ever more complex as volume increases, and 
turn-around times decrease. Over the last 2 years GSK have been 
developing a Statistical Quality Control system to meet this challenge. 
Key features of this system are: a modular rules-based approach to 
defining statistical measures of quality, provision of rules for both 
the screening process and for individual plates, provision of tools for 
both real-time and offline analysis of the quality data, early warning 
and alerting to process problems, and low impact integration with GSK's 
existing Activity Base screening environment. This system, developed 
with Tessella in 2005, is now being used routinely by screening 
scientists to analyze screening plates worldwide at GSK. It has 
facilitated the application of common business rules for QC across sites 
for passing or failing plates before publishing HTS data. "Take Home" 
Message: Through the application of a modular rules-based Statistical 
Quality Control system coupled with a sophisticated visualization and 
data analysis tool significant improvements in the efficiency of the 
Molecular Screening process can be realized.


“Flexible High-Content Analysis based on Image Mining – Automatic Image 
Analysis and Image Interpretation of cell pattern”, Horst Perner, CEO, 
ImageInterpret GmbH
Synopsis: Microscopic cell images are the basis of a wide variety of 
medical, pharmaceutical, biotechnological application. Although the 
background of the applications and the required statements obtained from 
the visual appearance of the cells might be different, we can see these 
applications from the computer-vision point of view as a whole class of 
applications for which it seems to be interesting to develop novel 
computer-based techniques that allow one to automatically analyze these 
images according to the desired statements. This is where image mining 
and automatic image interpretation comes into play.

"Flexible Text Mining for Knowledge Exploration in the Life Sciences", 
David Milward, Chief Technology Officer, Linguamatics
Synopsis: Text mining has provided value for some years in specific 
areas such as finding protein-protein interactions from Medline 
abstracts. This talk will provide examples of how the use of flexible 
text mining technologies, in particular Interactive Information 
Extraction, has allowed pharmaceutical companies to answer a much wider 
range of questions than was possible with either traditional text search 
or mining approaches. This applies across the entire pharmaceutical 
pipeline, in areas such as target validation, safety, biomarkers, 
pathways and translational science. A key to this is the ability to 
specify and fine tune appropriate queries according to the nature of the 
task, and the kind of documents being mined, whether full text, 
abstracts or semi-structured reports. A "toolbox" approach allows users 
to mix and match the use of keywords, classes or concepts from 
ontologies, with the use of co-occurrence and/or precise linguistic 
patterns. This enables users to fully explore and extract facts and 
connections hidden in document databases, whether those connections are 
within single documents or across multiple ones.

"Knowledge based search - towards answering questions", Dr. Michael R. 
Alvers, Prof. Dr. Michael Schroeder, TransInsight
The next generation literature search engines will be intelligent. With 
the help of biomedical background knowledge they can give an overview 
over large query results and answer questions.

When scientists search, they have questions in mind:
Which cellular component is associated with the protein Rab5? Which 
biological process is inhibited by Aspirin? What is Richard Axel at 
Columbia working on? Is apoptosis a hot topic? Who is a top author 
working on lipid rafts? Which journals publish on stem cells?

The scientific literature holds answers to all of these questions. But 
it is difficult to obtain them with classical search engines, as they 
merely present possibly long lists of search results and leave it up to 
the user to find the answer to his/her question. This problem is 
amplified by the exponential growth of biomedical literature databases, 
such as PubMed, which holds already over 16.000.000 abstracts. Thus, to 
find answers rather than to just search for them, the next-generation 
search engines have to be intelligent and use biomedical background 
knowledge. Fortunately, such knowledge exists already: One example is 
the GeneOntology, which is a 20.000 strong, hierarchical vocabulary for 
biological processes, molecular functions, and cellular components.

Key to this new search paradigm is the background knowledge, which is 
used to categorize documents. With efforts such as the GeneOntology, the 
needed knowledge is readily available. The GeneOntology contains for 
example the facts that endosome is a cellular component (is-a 
relationship), that apoptosis is also known as programmed cell death 
(synonyms) and that caspases are part of the apoptotic program (part-of 
relationship). The central problem of ontology-based search is the 
mapping of ontology terms to text. This problem, known as term 
extraction, is difficult, as authors do not write abstracts with an 
ontology in mind. The mapping must therefore be flexible and map the 
ontology term “transcription factor binding'' to the text “...a 
transcription that binds...'', although it does not appear literally. In 
the talk GoPubMed.org is described and examples of how knowledge can 
improve searching significantly are given.

How to Register
===============

To register and reserve your place please go to 
http://www.discoveryforum.net

-- 
Dr Jon Rees (Network Director) 
Oxfordshire Bioscience Network
Oxford Brookes University 
Headington Campus, Gipsy Lane 
Oxford OX3 0BP 
United Kingdom 
Tel. +44 (0) 1865 484224 
Fax. +44 (0) 1865 484478 
jon.rees at oxfordshirebioscience.net
http://www.oxfordshirebioscience.net

###

Save March 28th 2007 in your diary for BioTrinity!

BioTrinity - Drug Discovery, Diagnostics and Medical Devices 
The first biotechnology and healthcare partnering meeting to showcase the Oxford and South East super-cluster.

http://www.biotrinity.com

###



More information about the Bio-soft mailing list