Interactome algorithms - a problem in graph theory

Lord Snooty bonzo at
Mon Apr 28 04:31:48 EST 2003

Here's a better description of a typical problem, paraphrased from a puzzle
by Shasha. I've solved this one.

Description of Protein Mutagenesis Problem

Suppose there is a path of positive causality from one protein P1 to another
P2, P1 itself is produced, and none of the proteins along that path in
between have been eliminated. Then P2 will be produced also though the
that is produced may vary if other paths between P1 and P2 have eliminations
along the way.
For example, if we have a linear pathway A -> B -> C -> D -> E, then
eliminating A will eliminate all proteins. Eliminating B will eliminate B,
C, D, and E. Eliminating C will eliminate C, D, and E.

Sometimes there can be branching. For example, A may branch to B and C. Then
both B and C may induce D. In that case, eliminating B partly reduces the
output of D.

But we don't know the circuit. We just have certain evidence and we want to
discover the most likely circuit.


There are eight proteins in the first circuit and twelve in the second.
We know that all causalities are positive. In the first case, we know there
is no feedback.
In the second we suspect the feedback is there. Positive feedback only is
assumed {e.g. a cancer circuit).

In each mutagenesis experiment we directly modify one of the proteins
thereby eliminating it. The others are modified through causality as
explained above. There is one piece of bad news: we don't know which one we
modified, just that we have modified one.

Here are the results of the mutagenesis experiments. Since no two of these
results are the same, we know we have directly eliminated each protein in
exactly one
experiment (and possibly others by causality). The default behavior (without
sis) is that all proteins are produced at their full concentrations. Reduced
means that a protein is produced at a smaller concentration than normal.
Zero con-
centration means that a protein is not produced at all (i.e. is eliminated).
So, the first
line here says that A and G are produced at full concentration, D and E at
concentration and B, C, F, and H at zero concentration.

Full: A G    Reduced: D E   Zero: B C F H
Full: A B C D F G H  Reduced:    Zero: E
Full: A B C E G   Reduced: D    Zero: F H
Full: A B C E F G H  Reduced:    Zero: D
Full: A    Reduced: B C D E F H  Zero: G
Full: A B C D E G H  Reduced:    Zero: F
Full: A B D E F G H  Reduced:    Zero: C
Full:    Reduced:    Zero: A B C D E F G H

All possible mutagenesis experiments are performed above,
though we do not know which protein was eliminated in each case. Can we
figure out
at least one circuit from that information? Remember that all links are

Now for the case that may involve some feedback.

Full: A B F G I J L   Reduced: C D H  Zero: E K
Full: A C D E F G H I J K L  Reduced:   Zero: B
Full: B C D E F G H I J K L  Reduced:   Zero: A
Full: A B F G I J L   Reduced: C E K  Zero: D H
Full: A B D E F G H I J K L  Reduced:   Zero: C
Full: A B C D E G H I J K L  Reduced:   Zero: F
Full: A B C E F G H I J K L  Reduced:   Zero: D
Full: D E H J K    Reduced: C   Zero: A B F G I L
Full: A B F G I J L   Reduced: C D H K  Zero: E
Full:     Reduced:   Zero: A B C D E F G H I J K L
Full: A B C D E H I J K L  Reduced:   Zero: F G
Full: A B D E F G H J K L  Reduced: C   Zero: I

We try to design a circuit for this? Remember
that full means as much as in the case where there is no mutagenesis.
Apparently the
reactions don't spin off into infinite production because of some other
dampening factors
not involved. We try to find out how many solutions there are.
(End of example)

The problem is a lot harder when repression is possible, because then the
elimination of an element may actually enhance production.

It is hard to observe causality in action, but one can observe the
effects of disruption and thereby infer causality. This problem would be
much harder
under the more realistic assumption that mutagenesis could affect many

Lord Snooty wrote:

> Imagine we have a set of nodes interconnected in some way, and we wish
> to deduce the connections.
> A connection is defined as directed and reinforcing. A node is defined
> as either ON or OFF.
> For example, the graph
> A->B->C
> implies that C will be ON only if both A and B are ON (logical AND).
> However, in the graph
> A->
>       C
> B->
> C will be ON if either A or B is ON (logical OR).
> We are given the collection of nodes, and also given a table showing the
> ON/OFF node states which are observed when
> each node in turn is turned OFF (inhibited). It may or may not be the
> case that we need to know which actual node was
> inhibited, for each line in the table. For N nodes, we have an N-line
> table with N columns per line, of course.
> Notes
> 1. There may be many possible solutions for a given table, or possibly -
> I am not sure - even no solution.
> 2. No restriction is placed on the graph topology, and in particular we
> allow loops, and nested loops, to exist.
> 3. There's no concept of time; all measurements in the table are made
> after equilibrium is reached.
> My questions are:
> 1. Does a general procedure exist to deduce candidate graphs that
> satisfy the table ?
> 2. How well would such a procedure generalise for both excitatory
> (positive) connections as described above, but also
> for inhibitory (negative) connections?
> 3. How well would such a procedure generalise for a graded response? In
> this case we go from a digital node to an
> analog node, and then not only must we deduce the topology, but also the
> coupling coefficients of the connections.
> Andrew Palfreyman

More information about the Bio-soft mailing list