Andrew Rambaut wrote:
> Anders Gorm Pedersen wrote:
>>> I think you night find Hidden Markov Models (HMMs) to be useful for this
>> kind of thing. Briefly this type of model can be estimated ("trained") on
>> a set of aligned sequences, and then used in "generative mode" to produce
>> sequences having the same characteristics as the aligned set. I've [...]
>> How do these models deal with phylogeny? Does the model estimate the
> phylogenetic relationships between the sequences or assume independent
> lineages (or some sort of pair-wise relationship)?
In their simplest form, hidden Markov models don't deal with the phylogeny
at all, but rely on the unbiased (?) information that can be extracted by
"training" the model on the alignment. Depending on how one constructs the
model, this may include nucleotide frequencies (if working with DNA
sequences, they can also be used for protein sequences), dinucleotide
frequencies, trinucleotide freqs etc., they can model site-spefific rates
of indels and substitutions, take codon-structure into account and many
other things.
It is also possible to hardwire prior information into an HMM (average
transition/transversion rates for instance) and possibly then refine this
in a site-dependent manner by training on an alignment of the gene family
being investigated.
Of course, you don't want to make your model overly complicated by having
too many parameters (that would defeat the purpose of modelling in the
first place). Or at least not more parameters than the size of your data
set supports.
As mentioned, good starting points for learning about HMMs can be found on
the website of my colleague Anders Krogh:
http://www.cbs.dtu.dk/krogh/http://www.cbs.dtu.dk/krogh/refs.html
A good introduction is:
A. Krogh 1998. An Introduction to Hidden Markov Models for Biological
Sequences, In S. L. Salzberg et al., eds., Computational Methods in
Molecular Biology, 45-63. Elsevier.
A few refs about evolution and HMMs:
Felsenstein J, Churchill GA., Mol Biol Evol 1996 Jan;13(1):93-104
A Hidden Markov Model approach to variation among sites in rate of
evolution.
von Haeseler A, Schoniger M., J Comput Biol 1998 Spring;5(1):149-63
Evolution of DNA or amino acid sequences with dependent sites.
Schadt EE, Sinsheimer JS, Lange K., Genome Res 1998 Mar;8(3):222-33
Computational advances in maximum likelihood methods for molecular
phylogeny.
Mitchison GJ., J Mol Evol 1999 Jul;49(1):11-22
A probabilistic treatment of phylogeny and sequence alignment.
McGuire G, Wright F, Prentice MJ., J Comput Biol 2000 Feb-Apr;7(1-2):159-70
A Bayesian model for detecting past recombination events in DNA multiple
alignments.
--
Anders Gorm Pedersen, Ph.D.
Center for Biological Sequence Analysis, www.cbs.dtu.dk
Technical University of Denmark