Finding patterns in genome data

dannyayers at dannyayers at
Sat Jul 8 05:37:55 EST 2000

If anyone has access to a whole load of PCs and would like a job for
them (as a research project or whatever), what about the following:

Problem -
Genetic sequencing data is apparently available, where sections of an
organisms DNA will be represented as a string of base pairs. An
outstanding problem is to make sense of this data, finding patterns
that presumably are in the sequences (instructions for creating
physiological structures and so on).

Theory -
Techniques exist for finding hidden patterns in this type of data, and
one in particular, Kohonen's Self Organising Map (SOM) strikes me as a
good candidate for application to genome data. This algorithm views the
objects (for instance, in 'WEBSOM', the objects were newsgroup
postings) as connected nodes in (n-dimensional) space, and after a
random initialisation will iteratively move similar nodes closer
together. In the WEBSOM application this was used to generate a 2-
dimensional map of the newsgroup's contents, organised by textual
similarity. Note that the technique doesn't care about the contents,
just brings out structural features.
I would suggest that this could be very useful in visualising and/or
analysing genome data.

Application -
The Jini/Javaspaces system from Sun has a great deal of potential for
carrying out large scale distributed computational tasks, and I believe
would be perfect for a job like this.

If anyone has any thoughts on this (or even tries it), please let me


Sent via
Before you buy.

More information about the Bioforum mailing list