Massive Multiple Sequence Alignment tools?

Win Hide winhide at icon.co.za
Sat May 11 09:12:53 EST 1996


Sean Eddy wrote:
> 
> In article <4me5tb$1vi at swen.emba.uvm.edu> brianf at med.uvm.edu (Brian Foley) writes:
>   >Thank you very much for ideas on tools such as AMPS.
>   >In addition to performing an alignment on sequences, once
>   >I have all of the sequences ready for alignment, I was
>   >hoping to find a tool that would use the information obtained
>   >in a BLAST or FASTA run to help me obtain the sequences and
>   >clip out the region withhigh similarity to my query sequence.
> 
> You might check out hidden Markov model software. Two packages are
> publicly distributed that I know of: SAM from UC Santa Cruz
> (http://www.cse.ucsc.edu/research/compbio/sam.html) and HMMER from
> myself at Washington University
> (http://genome.wustl.edu/eddy/hmmer.html).
> 
> HMM multiple alignment algorithms are O(N) instead of O(N^2) in the
> number of sequences, so they are much more efficient for huge sequence
> sets. They also (in my hands) tend to be more accurate than other
> popular methods for large sequence sets (though for more reasonable
> numbers of sequences (10-50) I still prefer Clustal).  We've aligned
> sets as large as 2000+ sequences. Your 6000 would pose no problem.
> 
> HMMs can also allow you to align to a previous smaller multiple
> alignment. You can carefully hand-craft an alignment of a
> representative set of sequences, then align the rest of your 6000
> relative to that.
> 
> You can use an HMM built from your alignment to search for matches in
> other sequences. HMMER includes four different search algorithms: one
> for complete global alignment; one for Smith/Waterman local alignment;
> one for finding complete matches to the HMM in longer sequences (say,
> if you're trying to find several complete copies of immunoglobulin
> domains in a neural cell adhesion molecule sequence), and one for
> finding multiple non-overlapping Smith/Waterman local alignments.
> I agree, that this massive alignment situation can be overwhelming.
I have not had the pleasure of handling Sean's HMMer (*YET*) and so talk 
only from current experience. The SAM package is parallelized, and 
arrangements for large projects can be made I believe.

see http://www-hgc.lbl.gov/inf/maspar.html

There is active availability for massive alignment projects using 
simulated annealing and HMMs. Although there can be memory constraints 
from the hardware IE: there is an overlap here: 10-300 sequences can be 
handled using TIGR-MSA, excess of that via HMM's. Accounts for use on a 
MasPar funded by DOE are available are available at the Berkely Labs. A 
good manual is also available for SAM from the HTTP Sean has listed 
above. 

We are tackling similar projects from South Africa to LBL and have been 
able to get quite a lot done, so it is "possible" to attempt these large 
problems with a realistic outcome in sight.


Win



More information about the Mol-evol mailing list