***** NEW: In response to a number of requests, the deadline
for submission has been extended to JULY 5
[Apologies for multiple postings]
ECML/PKDD-2006 Workshop on
DATA AND TEXT MINING FOR INTEGRATIVE BIOLOGY
http://cui.unige.ch/~hilario/ecml-pkdd06-biows/
The Workshop on Data and Text Mining for Integrative Biology will be
held on September 18, 2006 in conjunction with the 17th European
Conference on Machine Learning and the 10th European Conference on
Principles and Practice of Knowledge Discovery in Databases in
Berlin, Germany (http://www.ecmlpkdd2006.org).
Background
Increasing use of high-throughput methods in molecular biology has
spawned unprecedented masses as well as novel types of data. DNA and
mRNA microarrays, mass spectrometry, and SNP chips are a few examples of
technologies that allow biologists to line up hundreds of experiments
while studying thousands of genes or proteins in a single experiment.
Thus high volume and high dimensionality are hallmarks of biological
data that data miners must cope with.
The availability of comprehensive datasets on key biological entities
has led life scientists from a reductionist, component-centred approach
to a more holistic or systemic approach. They can now examine
interactions among proteins or between DNA and proteins to build models
of molecular pathways and networks in an effort to understand the
functioning of cells, tissues and organisms. The trend toward systems
biology compounds problems of scale and high dimensionality with that of
increasing complexity: analysis must be pursued at multiple levels of
organization in order to achieve a comprehensive and coherent view of a
system's structure and dynamics. Systems biologists expect data miners
to provide them with the computational tools for representing,
integrating and modeling heterogeneous data as well as deciphering
complex patterns and systems.
Notwithstanding the massive amounts of biological data accessible in
over a thousand databases, the ultimate source of up-to-date information
in the field remains the biological literature. Despite the efforts of
curators who tirelessly scan known document servers, existing databases
cannot keep up with the current pace of scientific publishing; text
documents contain critical information that is not and may never be
found in structured databases. There is a need for efficient methods of
retrieving relevant documents and extracting information needed by both
curators and biological practitioners. Text mining research has made
significant strides in recent years, yet the stack of well-known
unsolved problems (e.g., anaphora resolution, word sense disambiguation)
is crumbling under new challenges such as analyzing multiple text grain
levels to extract and formalize complex information concerning networks
and systems.
Goals and Intended Audience
The emergence of systems biology poses a new set of challenges for both
data mining and text mining research. The goals of this workshop are:
1. to bring together researchers in data and text mining to discuss
latest insights and innovations as well as open issues related to
biological knowledge discovery from data and text;
2. to invite a number of computational and bench biologists to an
exploratory dialogue with data/text miners. Focus will be on unmet
data-analytical needs of biologists and computational research
tracks opened by the current trend toward integrative
bioinformatics/biology.
Topics
Technical papers are solicited on the following and related subjects:
Biological data mining
* Learning from high dimensional data
* Assessing and ensuring model stability and result reproducibility
* Integrating and analysing large-scale and disparate data (e.g.,
gene microarrays, SNPs, mass spectrometry, NMR, aptamers)
* Representing and mining complex data: sets and lists, pathways and
networks, time series, hierarchies and systems
* Linking genes or proteins to stages of disease development, going
beyond correlations to investigate causal relations
* Temporal modeling of molecular pathways and cellular mechanisms
* Data mining support for integrating genomics, proteomics, metabolomics
Biological text mining
* Innovative techniques for biological information retrieval and
extraction
* Corpus generation for effective text mining
* IE techniques to limit/overcome dependence on corpus pre-annotation
* Extracting complex information from text (pathways, networks,
processes, causal relations)
* Platforms for functional, transcriptomic, proteomic and integrated
database annotation
Integration issues in data and text mining
* Representing and using biological knowledge in data/text mining
* Mapping and aligning, evaluating and validating biological ontologies
* Use of ontologies to support data and text mining
* Learning ontologies from text collections and structured databases
* Semantic integration of information from structured data and text
* Learning from multiple heterogeneous data sources
* Integrating biological knowledge management and discovery
* Integrating data and text mining for biological knowledge discovery
For interdisciplinary discussions involving biologists,
bioinformaticians and data/text mining specialists, position papers on
the following and other related topics are welcome:
* Integration of discovery- and hypothesis-driven approaches in data
mining and systems biology
* Novel biological experimental techniques that have not received
adequate attention from the KDD community
* Identifying major bottlenecks in the extraction of new biological
knowledge from available data
* New -omics (e.g., metabolomics, lipidomics): their potential and
data-analytical requirements
* Prospects of using proposed data mining techniques and the
resulting models as clinical diagnostic tools
* Customizing data mining environments to specific needs of
biological users (ergonomy, interaction patterns, presentation of
results
Submission
Authors are invited to submit papers related to the topics listed above.
Technical papers will be assessed based on relevance, originality,
significance, technical soundness, and clarity of presentation by at
least 2 Program Committee members. Position papers will be examined by
both the PC and the Committee of Biology Experts for their potential to
stimulate constructive discussions and eventual collaboration between
data mining and biology specialists. Position papers from biologists
raising novel data mining challenges to meet integration issues in
biological research issues will be particularly welcome.
The maximum length of submissions is 12 pages for technical papers and 6
pages for position papers. All submissions should follow the
Springer-Verlag LNCS format. Author instructions and style files are
available at
http://www.springer.com/east/home/computer/lncs/authors.html. Please
send submissions as PDF files to Melanie.Hilario_at_cui.unige.ch and
Claire.Nedellec_at_jouy.inra.fr. A number of high quality papers will be
selected for publication, in expanded and revised form, in an
international journal with a strong focus on both knowledge discovery
and bioinformatics.
Program
To achieve the two-fold goal stated above, we propose a hybrid workshop
format composed of:
* (morning) technical sessions with oral presentations by data/text
mining researchers
* (afternoon) interdisciplinary sessions with the participation of
bioinformaticians and bench biologists. These sessions will include:
o 2 invited talks
o presentation of position papers by both biologists and
data/text miners
o a panel on selected issues from the topics proposed for the
interdisciplinary sessions
* (breaks) poster displays: To stimulate informal and more focused
discussions, all participants, data/text mining specialists and
biologists alike, will be encouraged to display posters describing
their work or simply their needs, expectations, etc.
Workshop Chairs
Melanie Hilario
Artificial Intelligence Lab, University of Geneva (CUI)
Melanie.Hilario at cui.unige.ch
Claire Nédellec
MIG, Institut National de la Recherche Agronomique (INRA)
Claire.Nedellec at jouy.inra.fr
Program Committee
Florence d'Alché-Buc University of Evry (France)
Sophia Ananiadou University of Manchester & NaCTeM (UK)
Christian Blaschke Bioalma (Spain)
Nigel Collier National Institute of Informatics (Japan)
James Cussens University of York (UK)
A. Fazel Famili NRC Institute for IT (Canada)
Lynette Hirschman The MITRE Corp. (USA)
Alexandros Kalousis University of Geneva (Switzerland)
Stefan Kramer University of Freiburg (Germany)
Maria Liakata University of Wales, Aberystwyth (UK)
Adeline Nazarenko Université Paris-Nord (France)
See-Kiong Ng Institute for Infocomm Research (Singapore)
Srinivasan Parthasarathy Ohio State University (USA)
Céline Rouveirol LRI Orsay (France)
Jasmin Saric EML Research gGMBH (Germany)
Jude Shavlik University of Wisconsin-Madison (USA)
Jaak Vilo University of Tartu (Estonia)
A scientific committee composed of bioinformaticians and lab biologists
will provide their expertise on matters related to biology and the
orientation of the interdiscplinary sessions:
Committe of Biology Experts
Terri K. Attwood University of Manchester (UK)
Philippe Beissières MIG, INRA (France)
François Radvanyi Institut Curie (France)
Jean-Charles Sanchez University Hospital of Geneva (Switzerland)
Antonia Vlahou BioAcademy of Athens (Greece)
Important Dates
Submission deadline July 5th, 2006
Notification of acceptance July 26th, 2006
Camera-ready papers due August 9th, 2006
=================================================================
Melanie Hilario Melanie.Hilario at cui.unige.ch
CUI - University of Geneva Tel: +41 22/379 7791
24 rue General-Dufour Fax: +41 22/379 7780
CH-1211 Geneva 4, Switzerland
=================================================================