sequence alignment by Minimum Message Length (MML) encoding

lloyd allison lloyd at bruce.cs.monash.edu.au
Mon Sep 30 00:00:38 EST 1991


    DNA alignment based on Minimum Message Length encoding (MML).

A set of routines based on the MML principle is available for research into
the alignment problem for two strings and the modelling of the mutation
process. 
They are available (via email) at no charge for non-commercial, non-classified,
research purposes.
They derive from work started in

  Allison L. & C.N.Yee. Bull. Math. Biol. 52(3) 431-453 1990

and extended and improved in

  Allison L., C.S.Wallace & C.N.Yee.
      AAAI Symposium on AI+Mol. Bio., Stanford, 1990.
and
  Tech report 90/148 Dept. Comp. Sci., Monash University, AUSTRALIA 3168

1, 3 and 5-state models of mutation are implemented.
They model simple, linear and piece-wise-linear indel costs respectively.
The probability of all alignments is added together (efficiently);
this gives a smooth cost function in all cases, amongst other effects.
Optimal parameter values are inferred from the given strings.
The parameter values are included in the message length at appropriate accuracy.
This allows comparison of alternative models on an equal footing.
There is an in-built null-theory and significance test.

A simple driver program is provided to make the routines usable.
Graphical routines are provided to print a probability density plot of
all alignments on a laser printer.
The routines are written in `C'.
They are moderately, but not excessively, heavy users of CPU time and a good
workstation, or more powerful machine, having *hardware* floating-point
arithmetic is recommended for their use on long strings.
They are not intended for quickly searching large data bases of sequences.

To get a `shar' script of the routines send (e)mail to the address below;
ditto for the Tech report 90/148 but please include a "real" paper
(snail) mail address.

Lloyd ALLISON
Department of Computer Science, UUCP:lloyd at bruce.cs.monash.edu.au
Monash University, Clayton,     or  :uunet!munnari!bruce.cs.monash.edu.au!lloyd
VICTORIA 3168, AUSTRALIA        Tel :565-5205               FAX: +61 3 565 5146

end.
--
Domain: curtiss at umiacs.umd.edu		     Phillip Curtiss
  UUCP:	uunet!mimsy!curtiss		UMIACS - Univ. of Maryland
 Phone:	+1-301-405-6710			  College Park, Md 20742



More information about the Comp-bio mailing list