A framework for cytome exploration - update 8 Feb. 2005

Peter Van Osta pvosta_NO_SPAM_ at _NO_SPAM_cs.com
Tue Feb 8 03:50:02 EST 2005


As the on-line version of my article on the Human Cytome Project and the
application of cytomics in medicine and drug discovery (pharmaceutical
research) evolves, I put the updated version in this newsgroup for
reference. The original "question" on a Human Cytome Project was posted in
this newsgroup on Monday 1 December 2003.

Original on-line version:


A framework for cytome exploration (this article):

A framework for cytome exploration
By Peter Van Osta

An entire organism is an anisotropic, densely packed, 4D grid (or matrix)
of a high order of “recursive” information levels. We can study its
structure and function at multiple levels, where the structure and
function at each level is intertwined with over- and underlying structures
and their function. The genotype and the phenotype both exist in a
continuum of (bidirectional) interacting organizational levels.

Here I want to present and discuss some ideas on the exploration of the
cytome and the conversion of the spatial, spectral and temporal properties
of the cytome and its cells into their in-silico digital representation.
It is a set of ideas about a concept which is still changing and growing,
so do not expect anything final or polished yet.

A modular and distributed framework should provide a unified approach to
the management of the quantitative analysis of space (X, Y and Z),
spectrum (wavelength) and time (t) related phenomena. We want to go from
physics to quantitative features and finally come to a classification and
understanding of the underlying biological process. We want to extract
attributes from the physical process which are giving us information about
the status and development of the process and its underlying structures.

First we have to create an in-silico digital representation starting from
the analogue reality captured by an instrument. The second stage (after
creation of an in-silico representation) is to extract meaningful parts
(objects) related to biologically relevant structures and processes.
Thirdly we apply features to the extracted objects, such as area and
(spectral) intensity, which represent (relevant) attributes of the
observed structure and process. Finally we have to separate and cluster
objects based on their feature properties into biologically relevant
subgroups, such as healthy versus disease.

In order to quantify the physical properties of space and time of a
biological sample we must be able to create an appropriate digital
representation of these physical properties in-silico. This digital
representation is then accessible to algorithms for content extraction.
The content or objects of interest are then to be presented to a
quantification engine which associates physical meaningful properties or
features to the extracted objects. These object features build a
multidimensional feature space which can be inserted into feature
analysers to find object/feature clusters, trends, associations and
correlations. Managing the flow

My personal interest is to build a framework in which acquisition,
detection and quantification are designed as modules each using plug-ins
to do the actual work and which operate on objects being transferred
through the framework. Data representing space, time and spectral sampling
are distributed throughout a data management system to be processed.  The
data flow through the framework and are subjected to plug-in modules which
operate on the data and transform the content into another content
representing space, such as physics to features. The focus is not on the
individual device to create the data or on individual algorithms, but on
the management of the dataflow through a distributed system to convert
spatial, spectral and temporal data into a feature (hyper-) space for
quantitative analysis. The software framework manages the entire flow and
transformation of data from physics to features, like a ball which is
thrown from player to player. Up- and downscaling of cell-based research
is dynamically managed by the system as the scale of processing does not
require a change in basic design. I will mostly focus on imaging
technology, but the basic principles should be applicable on any digitized
content extraction process. Images are digital information matrixes of a
higher order; they only become images as such when we want to look at them
and have to transform them into something which is meaningful for our
visual system. Visualisation provides us with a window on the data
content, but not necessarily on the data as such. Probing the sample

We want to extract from the sample its structure and its dynamics or the
flow of its structural changes through time. When applying digital imaging
technology to a biological sample, a clear understanding of the physical
characteristics of the sample and its interaction with the “sampling”
device is a prerequisite for a successful application of technology.

The basic principle of a digital imaging system is to create a digital
in-silico representation of the spatial, temporal and spectral physical
process which is being studied. In order to achieve this we try to let
down an equidistant sampling grid on the biological specimen. The physical
layout of this sampling grid in reality is never a precise isomorphic
cubical sampling pattern. The temporal and spectral sampling inner and
outer resolution is determined by the physical characteristics of the
sample (electromagnetic spectral range and spectral sampling layout) and
the interaction with the detection technology being used.

The instrument which converts the spatial (scale, dimensions), spectral
(electromagnetic energy, wavelength) and temporal continuum of the sample
into its digital representation allows us to take a view on biology beyond
the capacity of our own perceptive system. It rescales space, spectrum and
time into a digital representation accessible to human perception
(contrast-range, colour) and ideally also to quantification. Instruments
rescale spatial dimensions, spectral ranges and time into a scale which is
accessible to the human mind. The digital image acts as a see-through
window on a part of the physical properties of the biological sample, not
on the instrument as such.

We want to insert a probe system into the sample which changes its state
according to the physical characteristics of the sample. A probe is in
general a dual system, a structure/function reporter on one side and an
appropriate detector on the other side. The changes in the probe system
are ideally perfectly aligned in a spatial-spectral and temporal space
with the physical properties of the sample itself in space and time. Each
probe system senses the state of the specimen with a finite aperture and
so provides us with a view on the biological structure. All sensing is
done in a 5 dimensional environment, in 3D space, spectrum (wavelength)
and time. It is the inner an outer resolution of our sampling which
changes. When we do 2D imaging, this is the same as 3D with the 3rd
dimension collapsed to one layer, but due to the Depth of Focus (D.O.F.)
of the optical system we use, this represents a physical Z-slice.

In the spectral domain we also probe electromagnetic energy along the
spectral axis with a certain inner and outer resolution. We slide up and
down the spectral axis within the spectral limits of the probing system,
which transforms analogue electromagnetic energy into its digital
representation. A single CCD camera probes the visible spectrum (and
beyond) in one sweep, with a rather bad inner resolution. A 3CCD camera
uses 3 probes to do its spectral sampling and gives us a threefold
increase in inner resolution. Increasing or decreasing the density of the
spectral sampling is only a matter of spectral dynamics. By using n
cameras (or PMTs, etc.), each individually controlled (spectral) we can
expand or collapse our spectral inner and outer resolution. We tend to use
“spectral imaging” for anything which samples the visible spectrum
with more than the spectral resolution of a 3CCD camera. Up-and
downscaling our spectral sampling from broad to narrow, parallel or
sequential, continuous or discontinuous is a matter of applying an
appropriate detector array. A system can manage 1 to n spectral probing
devices such as cameras or PMTs (or a spectral filter in front of a single
detector), each sampling a part of the spectrum and spatially aligned
allows probing the spectrum in a dynamic way.

The time axis is also probed with a varying temporal inner and outer
resolution and depending on the characteristics of the detection device;
the time-slicing can be collapsed or expanded. Time can be sampled
continuously or discontinuously (time-lapse). We can expand or collapse
the temporal resolution of the detector in order to capture (temporal
integration) weak signals or shorten the time-slicing down to the minimum
achievable with a given detector.

In order to compensate for sensitivity deficits of a detector, three
strategies for improvement can be followed, but all three decrease the
sampling resolution. Spatial, spectral and temporal signal integration can
be used by expanding the physical scale of capturing along the spatial,
temporal or spectral axis or in combinations. Using a B/W camera instead
of a 3CCD camera is a way of spectral integration, but gives a threefold
reduction in spectral sampling.

The result of the detection is a 5-dimensional system expanding or
collapsing each dimension (XYZ, lambda, time) according to the
requirements of exploration. The device and its components attached to the
exploration core, imposes the inner and outer resolution limits upon the
system. In-silico these are only high-order matrix arrays representing a
5D space. We could call this a continuously variable in-silico

The inner an outer resolution of the probing system is determined by the
physical XYZ sampling characteristics of the sampling device, such as its
point spread function (PSF). For a digital microscope the resolving power
of the objective (XYZ) and its depth of view/focus are important issues in
experimental design and determining the application range of a device. The
interaction of the detection device with the image created by the optics
of the system such as Nyquist sampling demands, distribution of spectral
sensitivity, dynamic range, also plays an important role.


To increase the throughput of exploration we try to do multiple
experiments simultaneous to obtain multiple readouts at once. By doing
this we can postpone the choice from which micro experiment we shall
continue our exploration. The exploration of samples is organised in an
array-pattern (in general 2D due to technical limitations), ranging form a
single tissue slice on a glass slide up to a large scale grid of for
instance a cell or tissue expression arrays. Biological samples, up to
tissue samples are small enough to allow for multiplexing experiments and
they do not require large amounts of reagents in huge containers.
Multiplexing experiments with entire elephants would be somewhat
cumbersome, but DNA, protein, cells and parts of tissue nicely fit into
our instruments. Scaffold cultures would allow us to use the 3rd dimension
if we can properly capture its content. Dynamic scaffold culturing, would
allow us to disassemble the culture for content exploration and reassemble
them for continuation of the experiment (the ultimate scaffold culture is
the organism itself).

DNA and protein arrays are arrays of the first degree, as each sample in
an array in itself provides us with a scalar readout; there is no further
spatial differentiation. Cell arrays are of the second or third degree,
depending on the content (how many cells per array coordinate) and the
resolution of the readout. In an array of the second degree each array
coordinate is in itself an array as it is not a homogeneous sample
(multiple cells), but readout resolution is limited to the sub elements.
In an array of the third degree each of the sub elements is also
compartmentalized (e.g. tissue arrays, sub-cellular organelles, nuclear
organization) and each array coordinate is explored at sufficient
resolution. By using arrays with multiple cells at each coordinate, we can
create readout cascades at multiple readout resolutions. This way we can
combine speed and simplicity for a quick overview and switch to more
detail, to find out about cellular heterogeneity and/or sub-cellular

When we construct arrays with compartmentalized elements, we can up- and
downscale our exploration without the need to redo an entire experiment
and so extract more content from the experiment when wanted. The
experiment is arranged and its content is extracted in a way like Russian
dolls fit into each other. When the array consists of living cells or
tissue, we can add the time dimension to our experiment and create a 4D
array for experimental multiplexing.

The granularity or density of the array pattern is determined by the
experimental demands and upstream and downstream processing capacity. Of
course the optical characteristics of the sample carrier (glass, plastic)
will determine the spatial sampling limits in its inner and outer
resolution. The optical and mechanical characteristics of the device used
to explore the (sub) cellular physical domain will also lead to a spatial,
spectral and temporal application domain. The coarse grid-like pattern of
samples on a sample carrier is being explored at each array position at
the appropriate inner and outer resolution, within the optical physical
boundaries of the device used to capture the data. The outer resolution
barrier of the individual detector in space and time is extended by both
spatial and temporal tiling at a range of intervals. Spectral multiplexing
is being done by using spectral selection devices with the appropriate
spectral characteristics for the spectral profile of the sample.

Feedback loops on the content-flow

The detection cascade is not a one way passive flow of events, but we can
place content-driven feedback systems into the dataflow. Active feedback
and control depends on the degree of automation and flexibility of the
detection system. The spatial content capturing can be driven by a plug-in
which controls the spatial sampling in order to sample within the physical
boundaries of a sample (e.g. adaptive tissue scanning in 2D or 3D and
beyond). A plug-in is docked into the system to modify its behaviour and
make it respond to content changes. The decision process can be
implemented, based on a set of rules implemented as a neural network,
fuzzy logic or whatever is appropriate. Spatial, spectral and temporal
events can drive the process to create a content-driven acquisition
process. Feedback loops cross the dimension and scale boundaries, a
spectral change can drive a change in spatial layout, etc. A content drive
time-lapse will change its temporal pacing whenever a meaningful event is

Object extraction

The detection of appropriate objects for further quantification is done
either in-line within the acquisition process or distributed to another
process dealing with the object extraction. Objects should be aligned with
biological structures and processes. The pixel or voxel representation
in-silico however is basically “unaware” of this meta-information
about how the digital density pattern was created. The physical meaning of
one data point will change depending on the spatial, temporal and spectral
sampling and its inner and out resolution. The digital data build a
(dis)continuous representation of a spatial, spectral and temporal
continuum which expands or collapses in an anisotropic way.

The content of the data is of no meaning for a data-transfer system as
such, it only transfers the content throughout its dependencies.
Analytical tools operating on the data content need to be informed about
the layout of the data. Detection and quantification algorithms act on the
digital information as such and only the back-translation into physical
meaningful data requires a back-propagation into the real-world layout and
dimensions. The resulting discrete representation of the sampled spatial,
spectral and temporal grid at each array position is being sent to a
storage medium (file system, database…) to provide an audit trail for
quality assessment and data validation.

Content extraction

The selected objects are sent to a quantification module which attaches an
array of quantitative descriptors (shape, density …) to each object. We
expand or collapse the content extraction according to their meaning for
describing the biological phenomenon. Content extraction is being
multiplexed, just as the experiment itself.

Objects belonging to the same biological entity are tagged to allow for a
linked exploration of the feature space created for each individual
object. The resulting data arrays can be fed into analytical tools
appropriate for analysing a high dimensional linked feature space or
feature hyperspace. The dynamics of the attributes of the biological
system need not be aligned with the features we extract to create a
quantitative representation. An attribute change and a feature of which we
expect to represent this change may not be perfectly aligned, so we may
only capture a fraction of the actual change itself. Changes may occur in
a combined spatial-spectral and temporal space of which we can only
capture certain features, such as length, intensity, volume, etc.

The feature sets can be fed into analytical systems for statistical data
analysis, exploratory statistics, classification and clustering.
Classification performance can be improved by combining several
independent classifiers on the feature sets. The resultant vector of a
multiparametric quantification may point in the most meaningful direction
to capture a change. Both parametric and nonparametric approaches to
classification can be used.

We often try to do our experiments on a non-changing background (genetic
homogeneity) or average the background noise by randomisation. What we
call noise is in many cases not well understood but maybe meaningful
dynamic behaviour of a system? Trying to describe changes relative to
underlying oscillations, e.g. cell cycle, by using dynamic background
reporters could help to find dynamic correlations between events.

If you notice something incorrect or have any questions, send me an email.

Email: pvosta at cs dot com

First on-line version published on 9 Jan. 2005, last update on 7 Feb. 2005

The author of this webpage is Peter Van Osta, MD.

More information about the Cellbiol mailing list