[Computational-biology] Re: Looping inside Needleman-Wunsch algorithm & good values for the Similairity Matrix

Philipp Pagel via comp-bio%40net.bio.net (by pDOTpagel from helmholtz-muenchen.de)
Fri Oct 10 03:10:53 EST 2008


almurph from altavista.com <almurph from altavista.com> wrote:

>         Concerning the Needleman-Wunsch algorithm (cf.
> http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm) I have
> noticed a possible loop.

>         Inside the algorithm there is an important decision making mechanism.
> Its a "if, else if, else if" structure like:

> if(ScoreValue == DiagonalValue + SimilarityValue(i, j)
> {
>         blah;
> }
> else if(Score == Left + d)
> {
>         blah;
> }
> else if(Score == Up + d)
> {
>         blah;
> }

You mean the traceback part - right?

>         I've been playing with value of the similarity matrix and for certain
> values I can get the algorithms to stick as there is no else clause to
> offer an out if none of the above conditions are met.

By definition, one of these three cases must be true. The reason for
that is of course that the dynamic programming matrix was constructed
using these rules in the first place. I see two possible reasons for the
problem you observe:

 1) There is a bug in your program. Either in the construction phase or
    in the traceback

 2) You are experiencing a floating point precision problem.
    Testing floating point numbers for equality can be problematic...

>  It's on this point issue that I want to ask for help. The similaity
> Matrix that I built for my task is ultimately causing this problem. So
> i need to make it better.

> My question is: what properties should a
> good Similarity matrix have to avoid this loop? How does one go about
> constructing an effective Similarity matrix? What properties should it
> have? I'm not referring to the simple structure that score 1 for a
> match and 0 for a mismatch. i'm referring to more complex
> structures...

The above seems to indiate that you mean the scoring matrix when you say
"similarity matrix" - right?.  From the perspective of the algorithm it
does not matter at all - you can use whatever scoring matrix you want.
Of course from the biological point of view you want to choose a matrix
that best matches your problem, question or data.  But that is beside
the point because no scoring matrix will cause the traceback to fail.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f. Genomorientierte Bioinformatik
Technische Universität München
http://mips.gsf.de/staff/pagel



More information about the Comp-bio mailing list