sequence alignment by Minimum Message Length (MML) encoding
lloyd at bruce.cs.monash.OZ.AU
Sun May 5 21:23:25 EST 1991
DNA alignment based on Minimum Message Length encoding (MML).
A set of routines based on the MML principle is available for research into
the alignment problem for two strings and the modelling of the mutation
They are available (via email) at no charge for non-commercial, non-classified,
They derive from work started in
Allison L. & C.N.Yee. Bull. Math. Biol. 52(3) 431-453 1990
and extended and improved in
Allison L., C.S.Wallace & C.N.Yee.
AAAI Symposium on AI+Mol. Bio., Stanford, 1990.
and Tech report 90/148 Dept. Comp. Sci., Monash University, AUSTRALIA 3168
1, 3 and 5-state models of mutation are implemented.
They model simple, linear and piece-wise-linear indel costs respectively.
The probability of all alignments is added together (efficiently);
this gives a smooth cost function in all cases, amongst other effects.
Optimal parameter values are inferred from the given strings.
The parameter values are included in the message length at appropriate accuracy.
This allows comparison of alternative models on an equal footing.
There is an in-built null-theory and significance test.
A simple driver program is provided to make the routines usable.
Graphical routines are provided to print a probability density plot of
all alignments on a laser printer.
The routines are written in `C'.
They are moderately, but not excessively, heavy users of CPU time and a good
workstation, or more powerful machine, having *hardware* floating-point
arithmetic is recommended for their use on long strings.
They are not intended for quickly searching large data bases of sequences.
To get a `shar' script of the routines send (e)mail to the address below;
ditto for the Tech report 90/148 but remember to include a "real" return
Department of Computer Science, UUCP:lloyd at bruce.cs.monash.edu.au
Monash University, Clayton, or :uunet!munnari!bruce.cs.monash.edu.au!lloyd
VICTORIA 3168, AUSTRALIA Tel :565-5205 FAX: +61 3 565 5146
More information about the Bio-soft