On 2006-02-26, harald <please_noSpam at gmx.de> wrote:
> thanks a lot for your quick and detailed answers.
> The papers comparing PSI-Blast and HMM profiles and the one about the
> statistical theory were pretty interesting.
>> But since the database, which I want to search for homologs is very big
> (~3 Mio. sequences), I think that a tool like hmmsearch would be too slow.
If you are doing 3 million vs. 3 million, then HMM-based methods are
probably too slow for you. If you are doing a few hundred vs. 3
million, then HMM-based methods are OK. It takes a while to do the
iterative search and alignment needed to build a decent HMM, but
scoring sequences with it is not too terrible. I routinely score all
of PDB (about 22,000 sequences), and it usually takes a couple of
minutes for a 140-long HMM. Since running time is proportional to the
number of characters, scoring 3 million sequences would take about 5
hours (less on a more modern computer). This is feasible for hundreds
of models, but not millions of models.
------------------------------------------------------------
Kevin Karplus karplus at soe.ucsc.eduhttp://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE) (Board of Directors & Chair of Education Committee, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.