SANBI EST Clustering Benchmark Dataset

winhide winhide at
Tue Sep 28 20:26:32 EST 1999

SANBI is making available a dataset of masked ESTs suitable for
benchmarking. We are keen to evaluate the *hardware* performance of
clustering applications, and also the *clustering* performance and accuracy.
The dataset represents a randomly chosen set of Human eye-expressed ESTs
that have been masked for repeats and vector sequences. It has not as yet
been assigned to 'true' gene classes, as these have not all been assigned
against available genome data.

The dataset can be found at

The dataset is made available with the proviso that results of benchmarking
should be
made broadly available.  Our own results and a suggested format are found in

Algorithmic benchmarks can be found at

Unfortunately, due to spamming, uploads to the FTP site are not possible.
Please email results including clusters if possible to info at and
they will be posted.

Win Hide, Alan Christoffels, Andrey Ptitsyn and Antoine van Gelder

More information about the Bionews mailing list