Computational Biology at Penn

Tandy Warnow tandy at
Fri Oct 14 18:28:12 EST 1994

                   Sponsored by the Departments of
       Computer and Information Sciences, Genetics, and Biology
                  at the University of Pennsylvania

         "The Statistical Significance of Biological Sequence
                        Database Comparisons"

                       Dr. Michael S. Waterman
                      Department of Mathematics
                   University of Southern California

                     3pm Friday, October 28, 1994
                      University of Pennsylvania
                    3401 Walnut Street*, Suite 401C
                           Philadelphia, PA

One of the most important activities in computational biology is the
estimation of similarity between two sequences of DNA or protein,
presumably a reflection of their evolutionary relatedness.  A central
question in biological sequence comparison is the statistical
significance of an observed similarity, for example those that arise in
database searches.  This problem has been rigorously addressed in the
case of BLAST and related algorithms for local alignment of
subsequences allowing mismatches but not gaps due to insertions and
deletions.  However, the problem of determining the statistical
significance of optimal local alignments containing such gaps,
typically computed using well-known dynamic programming algorithms, has
so far not been solved mathematically.  A practical method will be
presented to approximate the probability that a local alignment score
is a result of chance alone.  For a set of similarity scores and gap
penalties only one simulation of random alignments needs to be
calculated to derive the key information allowing one to estimate the
significance of any alignment calculated under this setting.
Applications to database searching and the analysis of pairwise and
self-comparisons of proteins will be described.

This is the inaugural seminar of a series sponsored by the new Penn
Computational Biology Program, an interdisciplinary effort involving
the Departments of Biology, Genetics, and Computer and Information
Science.  Penn has recently been awarded a training grant by the
National Science Foundation, for PhD students and post-docs interested
in work spanning these fields.  For further information contact Dr.
Warren Ewens, at 898-7109, or by e-mail at wewens at

