lloyd at bruce.cs.monash.edu.au
Thu Feb 10 19:05:58 EST 1994
may be of interest to multiple-aligners and to evolutionary-tree unscramblers:
%A L. Allison
%A C. S. Wallace
%T An Information Measure for the String to String Correction Problem with
%J 17th Annual Computer Science Conference
%W University of Canterbury, Christchurch, New Zealand
%O Australian Computer Science Communications, 16(1 C)
%E G. Gupta
%K MolBio, ACSC ACSC17 ACSC94, sequence string analysis, multiple alignment,
family evolutionary phylogenetic tree trees, Gibbs sampling,
simulated annealing, MML MDL
%X isbn 0-473-02313-X
Algorithms for approximate string matching are widely used to infer a
relation between two strings in situations where strings mutate,
evolve or contain noise, eg. DNA or protein strings.
Such a relation is (just) a hypothesis.
It has been shown how to calculate a probability for such hypotheses.
Unfortunately the matching algorithms do not directly extend to more than
two strings because of rapidly increasing complexity.
However, a stochastic method is one way around this difficulty.
The probabilities of competing hypotheses are used to generate
random alignments of two or more strings.
The alignments are sampled from the correct probability distribution.
Averaging over many such alignments gives good estimates of how closely
the strings are related and in what way. (Mediocrity is useful.)
In addition, sampling in an increasingly selective way gives
a simulated annealing search for an optimal multiple-alignment.
Central Inductive Agency,
Department of Computer Science, email: lloyd at cs.monash.edu.au
Monash University, Clayton, tel: 61 3 905 5205
Victoria 3168, AUSTRALIA fax: 61 3 905 5146
More information about the Bio-soft