IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Fwd: Symposium on Large Phylogenies

Geoff Read g.read at niwa.cri.nz
Tue Dec 9 19:29:56 EST 1997

Forwarded from a newsgroup.
The preamble at least might interest some of you.

From: Junhyong Kim <Junhyong_kim at quickmail.yale.edu>
Newsgroups: sci.bio.systematics
Subject: Symposium on Large Phylogenies
Date: Mon, 01 Dec 1997 15:44:15 -0400


Estimating Large Scale Phylogenies: Biological, Statistical, and
Algorithmic Problems

SPONSORS: DIMACS and University of Pennsilvania Program in Computational
Biology LOCATION: Princeton University DATE: June 26-28, 1998 FORMAT: Paper
presentations and posters. All papers for oral presentation must be
submitted in full and they will be peer reviewed.

PAPER SUBMISSION DEADLINE: April 15, 1998. Please submit papers by mail or
email (ps file/MS Word file only) to:

Junhyong Kim
Dept. of Biology
Yale University
165 Prospect st.
New Haven, CT 06511
(203)-432-3854 (fax)
junhyong_kim at quickmail.yale.edu

Co-organizers: Junhyong Kim, Tandy Warnow, and Ken Rice


Biological organization is fundamentally based on an evolutionary
history of bifurcating descent-with-modification. Phylogenetic
estimation is the inference of this genealogical history from present
day data. Phylogenetic trees, the graph representation of the
genealogical history, play a central role in evolutionary biology and
phylogenetic estimation techniques are being applied to a wide variety of
computational biology problems.

The size of a phylogenetic estimation problem is measured by the number of
taxa and the number of characters. Until recently, computational and data
limitations kept most phylogenetic estimation problems to small numbers of
taxa. But, the availability of computational resources and the influx of
large molecular data sets are enabling researchers to tackle increasingly
larger problems, and the analysis of large-scale data sets is rapidly
becoming a central problem in phylogenetic biology. 

Recent experimental evidence has established the existence of large
trees that can be estimated accurately as well as those that are
difficult to accurately estimate with reasonable numbers of characters.
Some of these examples have suggested that taxon sampling (increasing the
size of the estimation problem through the addition of taxa rather than
characters) might lead to more easily estimated trees. Conversely, it has
been argued that big trees are hard to infer for a variety of reasons:
NP-hardness of the optimization problems, properties of the search space,
inadequacy of the heuristics, and even possible inadequacy of the
optimization criteria. Unfortunately, very little actual evidence is
available to support any conjectures about how the performance of
estimators scale with respect to the size of the phylogenetic problem. In
addition, the question of scaling is itself confused by poorly delineated
notions. For example,  the size of the tree also involves the maximum
amount of divergence (not only the number of taxa and characters) and
measures of estimator performance have also not been standardly agreed

The goal of this symposium is to precisely identify the key problems
with respect to how the performance of phylogenetic estimators scale as
with the size of the problem, and gather experimental and theoretical
results addressing this problem. 


The symposium will consist of four topic sessions with paper
presentations followed by a panel discussion of invited experts. The
four topics and some of the questions to be addressed in each session

Biological problems

1. What are the limits to sampling characters and taxa?
2. What are examples of very difficult problems?
3. What are the reasonable models of character evolution and tree shape?
4. What are the most important problems in systematics? 
5. What can we say about evolutionary history from data other than rows 
and columns of homologous characters?

Empirical results

1. What do simulation studies tell us about performance of different
methods and how they scale with the size of the problem?
2. What properties of the tree models affect accuracy and how do those
properties scale? 
3. Are there any methodological biases? 
4. What can we say about performance under more realistic models of
sequence evolution from the existing studies? 
5. Is there a need to standardize experimental studies, perhaps through
the establishment of a testbed of different model trees, methods, etc?

Algorithmic problems

1. What is the relationship between standard optimization problems
(distance-based criteria, parsimony, etc) and estimating the topology of
evolutionary trees? Which of the standard optimization criteria are best
suited to obtaining highly accurate topology estimations, given bounds on
the available sequence length? 
2. How much of the difficulty is due to inadequate solution to the right
NP-hard optimization problems? 
3.  Are there new optimization problems or approaches (not necessarily
linked to optimization criteria) that are promising? 
4. How good are the existing heuristics for solving the relevent
optimization problems, and what new approaches might give better results
on important optimization problems? 
5. How should we evaluate performance of algorithms?
6. Are there 'algorithms engineering' issues which will make these
methods less powerful, and how do we handle them? 
7. Is it possible to design methods which can efficiently characterize all
optimal and near-optimal trees, rather than just a single optima?

Statistical problems

1. What bounds can we obtain on the convergence rate of different
2. How do various statistical properties of different methods scale with
the size of the problem? 
3. What is the relationship between estimating the whole tree versus some
subset of the tree? 
4. What is the distribution of specific tree characteristics such as
smallest edge length, smallest diameter for quartet covering, steminess,
etc. with respect to tree model sampling distribution? 
5. Can we obtain accuracy bounded estimates (sacrificing resolution)?

Answers please on the back of a small envelope to:

   Discuss  =  <annelida at net.bio.net> = talk to all members
   Server =  <biosci-server at net.bio.net> = un/subscribes
   Archives  = http://www.bio.net:80/hypermail/ANNELIDA/
   Resources = http://biodiversity.uno.edu/~worms/annelid.html

More information about the Annelida mailing list

Send comments to us at biosci-help [At] net.bio.net