Internal repeats Q & A

Paul Bieganski biegansk at hannibal.cs.umn.edu
Wed May 19 22:57:39 EST 1993


In article <930514075122.5345 at CUCCFA.CCC.COLUMBIA.EDU> MARK at CUCCFA.CCC.COLUMBIA.EDU (T. Mark Reboul) writes:
>...
>I'm no expert, but there seems to have been a lot of work done on "suffix
>trees" and their efficient construction, in the pure c.s. field, related 
>to lexicographic ordering/sorting problems, and perhaps some of those 
>algorithms might become useful for handling longer sequences.
>
>Mark Reboul
>Columbia University
>Columbia-Presbyterian Cancer Center Computing Facility
>mark at cuccfa.ccc.columbia.edu

Yes - suffix trees are a way of answering this (and many other) question
about a sequence.  Suffix trees and generalized suffix trees (suffix trees of
multiple sequences) provide a unique content-addressable way of accessing
sequence data and provide information about many 'interesting' features, such
as internal repeats.  Actually, suffix trees and generalized suffix trees
provide the basis for a number of interesting applications including searches
('exact' and homology-based), contig reassembly and multiple sequence
alignment.  The key to wider use of these structures is their efficient
creation and storage (they tend to get quite large).  Our group has been
researching suffix-tree based sequence processing for a while and we have
developed a prototype system that is capable of storing the content of
GenBank in a single tree and performing searches on it.  I am currently
writing a paper describing some of our work for HICSS-27 (Hawaii Int'l Conf.
on Systems Sciences).

-Paul
-----------------------------------------------------------------------------
Paul Bieganski
University of Minnesota, Minneapolis, Computer Science Department
biegansk at cs.umn.edu








--
==============================================================================
| Paul Bieganski | biegansk at cs.umn.edu |                                     |
==============================================================================




More information about the Bio-soft mailing list