# contig-building software from ICRF

R_MOTT at ICRF.AC.UK R_MOTT at ICRF.AC.UK
Fri Oct 30 12:33:00 EST 1992

Path: wheel!r_mott
From: r_mott at uk.ac.icrf
Newsgroups: biosci.software
Subject: contig-building software from ICRF
Message-ID: <50598 at uk.ac.icrf>
Date: 30 Oct 92 17:33:30 GMT
Organization: Imperial Cancer Research Fund, London, UK
Lines: 123

Software Tools for Ordering Clone Libraries from Probe Hybridisation Data
=========================================================================

Richard Mott and Andrei Grigoriev

Imperial Cancer Research Fund
Genome Analysis Laboratory
44 Lincoln's Inn Fields
London WC2A 3PX
UK

email rmott at gea.lif.icnet.uk, a_grigoriev at gea.lif.icnet.uk

A suite of programs for manipulating, displaying and ordering probe
hybridisation data is now available from the EMBL software server and
from the authors. The software was written to aid the construction of
YAC, P1 and cosmid maps of the fission Yeast S. pombe (Maier et al
1992, Hoheissel et al 1992) and so has been tested thoroughly on real
data. All the programs are written in C, with a subset using the XView
and Xlib graphics libraries. The programs are described in (Mott et
al, 1992)

Briefly, in a hybridisation experiment a labelled DNA probe is
hybridised onto a filter containing a clone library spotted out in a
high-density grid. Clones that are positive for the probe show up as
dark spots on an autoradiograph of the filter. If the probe is
single-copy then all clones hybridising with it will lie in the same
region of the genome, and probes with common positive clones should be
neighbours.  By hybridising a sufficient number of probes to filters
it is possible to order the library into contigs of overlapping
clones. The task of ordering libraries is complicated by the fact that
experimental noise (miss-scorings, repetitive probes and chimeric
clones etc) makes a naive ordering approach untenable, so it was
necessary to develop robust ordering algorithms.

Those programs that filter or rearrange the data write output files in
the same format as the inputs, so that they may be used as inputs to
the other applications. For example, the user might extract a subset
of the data with one programs, order it with another and then display
the results with a third. There are three types of data-file used by
the programs; hybridisation data, contig data (lists of contigs, ie
ordered probes ) and map data (a list of probes ordered from the
genetic map or from some other map).

With such a large volume of data an important requirement is to view
hybridisation data easily. To this end we have written a PostScript
generator, SHOW, which displays a set of probe-clone hybridisations as
a matrix, with the probes as columns and the clones as rows. Where a
probe and clone hybridise then the corresponding row-column
intersection is printed black. If the data have been ordered then
contigs appear as overlapping runs of positives.  Repetitive probes
are immediately apparent as off-diagonal vertical stacks of positives,
chimeric clones as off-diagonal horizontal lines and random false
positives as isolated dark spots.  SHOW can either summarize the
entire data-set on a single page, or show it in greater detail spread
over several pages.  Annotations or labels may be attached to the
clones and probes.

If the data have been ordered into contigs then the positives should
occur in overlapping runs and inconsistencies in the data are
then immediately apparent by eye because all the hybridisations to
each clone are visible, including those which do not fit well with the
current order of clones and probes.  This is in contrast with the
usual representation of a physical map, where clones are summarised as
intervals which have been packed into as few lines as possible.

XVSHOW is an XView analogue to SHOW, displaying a scrollable portion
of the clone-probe matrix on the screen of a workstation running the
OpenWindows window manager.

SELECT is a menu-driven program for choosing subsets of hybridisation
data. A user can perform Boolean operations to include or exclude
clones which hybridise to particular probes or classes of probes.
UNHIT is a non-interactive program for selecting clones and probes,
which also gives a list of clones not yet hit by a probe.  This was
used to pick further cosmid probes for {\em S.  pombe} when sampling
without replacement.  FILTER removes clones which are obviously noisy,
such as well contaminants, or which hybridise to unrealistically large
numbers of probes and thus are very likely to contain highly
repetitive elements.

REORDER is a tool to reorder a set of clones to a given order of
probes.  A probe order can be fed into REORDER to generate the clone
order, which is then displayed using SHOW.  One can optionally specify
a set of sleeping probes'', which are ignored when ordering the
clones but which are output next to the probes used to order the
clones.  Consequently one may check the consistency of the contigs
found with one set of probes (eg cosmid probes on cosmid filters)
against the hybridisations of another set (such as YAC probes on
cosmid filters).

XVEDIT is a contig editor running under Xview which allows the user to
edit probe orderings, moving, deleting and inserting probes and
contigs and fitting the clones to the resulting orders. It is
essentially a graphical interactive version of REORDER.

PROBEORDER, BARR and COSTIG are programs for ordering probes.  They
all assume that the probes are single-copy (although BARR and COSTIG
will attempt to filter out probes which are likely to be repetitive),
and work by ordering the probes and then fitting the clones to the
probe order. PROBEORDER calculates a distance measure between each
pair of probes and then uses simulated annealing to find the shortest
path connecting all the probes. BARR and COSTIG use heuristic rules to
eliminate possible repetitive probes and chimeric clones to
clean the data into a consistent set which can then be ordered
directly.

PATTERNORDER orders clones by computing all pairwise clone distances
and then useing simulated annealing to find that order of clones with
minimum path length.

Hoheisel,J., Maier,E., Mott,R. McCarthy,L., Submitted to Cell

Maier,E. Hoheissel,J., McCarthy,L., Mott,R., Grigoriev,A., Monaco,A.,
Larin,Z. and Lehrach,H. (1992) Nature Genetics 1, 273-277

Mott,R. Grigoriev, A. Maier,E. Hoheissel,J., Lehrach, H. Submitted to
Nucleic Acids Research