[Computational-biology] ECML-PKDD WS on Data and Text Mining for Integrative Biology

Melanie Hilario Melanie.Hilario at cui.unige.ch
Wed Jun 21 11:14:07 EST 2006

[Apologies for multiple postings]

                   ECML/PKDD-2006 Workshop on


The Workshop on Data and Text Mining for Integrative Biology will be
held on September 18, 2006 in conjunction with the 17th European
Conference on Machine Learning and the 10th European Conference on
Principles and Practice of Knowledge Discovery in Databases in
Berlin, Germany (http://www.ecmlpkdd2006.org).


Increasing use of high-throughput methods in molecular biology has
spawned unprecedented masses as well as novel types of data. DNA and
mRNA microarrays, mass spectrometry, and SNP chips are a few examples of
technologies that allow biologists to line up hundreds of experiments
while studying thousands of genes or proteins in a single experiment.
Thus high volume and high dimensionality are hallmarks of biological
data that data miners must cope with.

The availability of comprehensive datasets on key biological entities
has led life scientists from a reductionist, component-centred approach
to a more holistic or systemic approach. They can now examine
interactions among proteins or between DNA and proteins to build models
of molecular pathways and networks in an effort to understand the
functioning of cells, tissues and organisms. The trend toward systems
biology compounds problems of scale and high dimensionality with that of
increasing complexity: analysis must be pursued at multiple levels of
organization in order to achieve a comprehensive and coherent view of a
system's structure and dynamics. Systems biologists expect data miners
to provide them with the computational tools for representing,
integrating and modeling heterogeneous data as well as deciphering
complex patterns and systems.

Notwithstanding the massive amounts of biological data accessible in
over a thousand databases, the ultimate source of up-to-date information
in the field remains the biological literature. Despite the efforts of
curators who tirelessly scan known document servers, existing databases
cannot keep up with the current pace of scientific publishing; text
documents contain critical information that is not and may never be
found in structured databases. There is a need for efficient methods of
retrieving relevant documents and extracting information needed by both
curators and biological practitioners. Text mining research has made
significant strides in recent years, yet the stack of well-known
unsolved problems (e.g., anaphora resolution, word sense disambiguation)
is crumbling under new challenges such as analyzing multiple text grain
levels to extract and formalize complex information concerning networks
and systems.

     Goals and Intended Audience

The emergence of systems biology poses a new set of challenges for both
data mining and text mining research. The goals of this workshop are:

    1. to bring together researchers in data and text mining to discuss
       latest insights and innovations as well as open issues related to
       biological knowledge discovery from data and text;
    2. to invite a number of computational and bench biologists to an
       exploratory dialogue with data/text miners. Focus will be on unmet
       data-analytical needs of biologists and computational research
       tracks opened by the current trend toward integrative


Technical papers are solicited on the following and related subjects:

       Biological data mining

     * Learning from high dimensional data
     * Assessing and ensuring model stability and result reproducibility
     * Integrating and analysing large-scale and disparate data (e.g.,
       gene microarrays, SNPs, mass spectrometry, NMR, aptamers)
     * Representing and mining complex data: sets and lists, pathways and
       networks, time series, hierarchies and systems
     * Linking genes or proteins to stages of disease development, going
       beyond correlations to investigate causal relations
     * Temporal modeling of molecular pathways and cellular mechanisms
     * Data mining support for integrating genomics, proteomics, metabolomics

       Biological text mining

     * Innovative techniques for biological information retrieval and
     * Corpus generation for effective text mining
     * IE techniques to limit/overcome dependence on corpus pre-annotation
     * Extracting complex information from text (pathways, networks,
       processes, causal relations)
     * Platforms for functional, transcriptomic, proteomic and integrated
       database annotation

       Integration issues in data and text mining

     * Representing and using biological knowledge in data/text mining
     * Mapping and aligning, evaluating and validating biological ontologies
     * Use of ontologies to support data and text mining
     * Learning ontologies from text collections and structured databases
     * Semantic integration of information from structured data and text
     * Learning from multiple heterogeneous data sources
     * Integrating biological knowledge management and discovery
     * Integrating data and text mining for biological knowledge discovery

For interdisciplinary discussions involving biologists,
bioinformaticians and data/text mining specialists, position papers on
the following and other related topics are welcome:

     * Integration of discovery- and hypothesis-driven approaches in data
       mining and systems biology
     * Novel biological experimental techniques that have not received
       adequate attention from the KDD community
     * Identifying major bottlenecks in the extraction of new biological
       knowledge from available data
     * New -omics (e.g., metabolomics, lipidomics): their potential and
       data-analytical requirements
     * Prospects of using proposed data mining techniques and the
       resulting models as clinical diagnostic tools
     * Customizing data mining environments to specific needs of
       biological users (ergonomy, interaction patterns, presentation of


Authors are invited to submit papers related to the topics listed above.
Technical papers will be assessed based on relevance, originality,
significance, technical soundness, and clarity of presentation by at
least 2 Program Committee members. Position papers will be examined by
both the PC and the Committee of Biology Experts for their potential to
stimulate constructive discussions and eventual collaboration between
data mining and biology specialists. Position papers from biologists
raising novel data mining challenges to meet integration issues in
biological research issues will be particularly welcome.

The maximum length of submissions is 12 pages for technical papers and 6
pages for position papers. All submissions should follow the
Springer-Verlag LNCS format. Author instructions and style files are
available at
http://www.springer.com/east/home/computer/lncs/authors.html. Please
send submissions as PDF files to Melanie.Hilario_at_cui.unige.ch and
Claire.Nedellec_at_jouy.inra.fr. A number of high quality papers will be
selected for publication, in expanded and revised form, in an
international journal with a strong focus on both knowledge discovery
and bioinformatics.


To achieve the two-fold goal stated above, we propose a hybrid workshop
format composed of:

     * (morning) technical sessions with oral presentations by data/text
       mining researchers
     * (afternoon) interdisciplinary sessions with the participation of
       bioinformaticians and bench biologists. These sessions will include:
           o 2 invited talks
           o presentation of position papers by both biologists and
             data/text miners
           o a panel on selected issues from the topics proposed for the
             interdisciplinary sessions
     * (breaks) poster displays: To stimulate informal and more focused
       discussions, all participants, data/text mining specialists and
       biologists alike, will be encouraged to display posters describing
       their work or simply their needs, expectations, etc.

     Workshop Chairs

Melanie Hilario
Artificial Intelligence Lab, University of Geneva (CUI)
Melanie.Hilario at cui.unige.ch

Claire Nédellec
MIG, Institut National de la Recherche Agronomique (INRA)
Claire.Nedellec at jouy.inra.fr

      Program Committee

Florence d'Alché-Buc	  University of Evry (France)
Sophia Ananiadou	  University of Manchester & NaCTeM (UK)
Christian Blaschke	  Bioalma (Spain)
Nigel Collier	          National Institute of Informatics (Japan)
James Cussens	          University of York (UK)
A. Fazel Famili	          NRC Institute for IT (Canada)
Lynette Hirschman	  The MITRE Corp. (USA)
Alexandros Kalousis	  University of Geneva (Switzerland)
Stefan Kramer	          University of Freiburg (Germany)
Maria Liakata	          University of Wales, Aberystwyth (UK)
Adeline Nazarenko	  Université Paris-Nord (France)
See-Kiong Ng	          Institute for Infocomm Research (Singapore)
Srinivasan Parthasarathy  Ohio State University (USA)
Céline Rouveirol	  LRI Orsay (France)
Jasmin Saric		  EML Research gGMBH (Germany)
Jude Shavlik	          University of Wisconsin-Madison (USA)
Jaak Vilo	          University of Tartu (Estonia)

A scientific committee composed of bioinformaticians and lab biologists
will provide their expertise on matters related to biology and the
orientation of the interdiscplinary sessions:

     Committe of Biology Experts

Terri K. Attwood	University of Manchester (UK)
Philippe Beissières	MIG, INRA (France)
François Radvanyi	Institut Curie (France)
Jean-Charles Sanchez	University Hospital of Geneva (Switzerland)
Antonia Vlahou	        BioAcademy of Athens (Greece)

     Important Dates

Submission deadline	        June 28th, 2006
Notification of acceptance	July 26th, 2006
Camera-ready papers due	        August 9th, 2006

Melanie Hilario                      Melanie.Hilario at cui.unige.ch
CUI - University of Geneva           Tel: +41 22/379 7791
24 rue General-Dufour                Fax: +41 22/379 7780
CH-1211 Geneva 4, Switzerland

More information about the Comp-bio mailing list