How to detect a tandem repeat?

frist at frist at
Tue May 25 18:14:09 EST 1993

In article <C7IG5p.CKH at> billyli at (Billy Li) writes:
>  Sorry for a dumb question.  Did anyone know how to precisely define
>a tandem repeat (prossibly with a good reference)?  Biologists have
>published sequences claiming a segment is a tandem repeat.  How do they
>detect them and is there any algorithm or software that can detect
>tandem repeats?  How did tandem repeats differ from slippage since
>both types of repeats occur together.
>  Sorry for so many questions and I look forward to hear your responses.
>  Please e-mail if possible.
>Billy Li,  email: billyli at
>Department of Statistics,
>University of Hong Kong.
>Tel: (852) 8591920     Fax: (852) 8589041
There are several approaches to detecting tandem repeats:

1) Needleman/Wunsch/Sellers type algorithms. These algorithms usually 
approximate an exhaustive regime of comparisons of the a sequence with
itself, in all possible alignments. There are many variations on this

eg.         GATGATGAT--->  slide top sequence anong bottom seq.

2) Data dictionaries. By sorting all possible subsequences in a lexical
fashion, tandomly-repeated sequences will appear near each other in the 



3) Dot-Matrix similarity searches. In this approach, the sequence is
written on both X & Y axes of a matrix. Where subsequences match
above some threshold, a dot or some other character is printed at the
corresponding X,Y coordinate in the matrix. 

               10        20
      C         .         .         .         .
      G A       .         .         .         .
      T  A      .         .         .         .
      A   A     .         .         .         .
      T    A    .         .         .         .
      C     A   .         .         .         .
      A      A  .         .         .         .
      T       A .A  A     .         .         .
      G        A. A  A    .         .         .
      T       A .A  A     .         .         .
      G        A. A  A    .         .         .
      A         A  A  A   .         .         .
      T       A .A  A     .         .         .
      G        A. A  A    .         .         .
      A         A  A  A   .         .         .
      T         .      A  .         .         .
      A         .       A .         .         .
      C         .        A.         .         .

The main diagonal of A's indicates that the sequence matches itself at
all positions. Tandem repeats appear as shorter diagonals symmetrically
arrayed about the main diagonal. The beauty of this approach is the 
fact that the superb pattern recognition abilities of the human brain
are exploited. In my opinion, this method is far better than 1 or 2
at finding tandem repeats.

I don't immediately have references to programs specifically designed
for searching for tandem repeats, but any similarity search program
can be used for this purpose. 

An explanation of dot-matrix searches can be found in:

Fristensky, B. (1986) Nucleici Acids Res. 14:597-610.

which is available by anonymous FTP to the directory psgendb at

Brian Fristensky                | 
Department of Plant Science     |  A question is like a knife that slices
University of Manitoba          |  through the stage backdrop and gives us
Winnipeg, MB R3T 2N2  CANADA    |  a look at what lies hidden behind.
frist at          |  
Office phone:   204-474-6085    |  Milan Kundera, THE UNBEARABLE LIGHTNESS 
FAX:            204-261-5732    |  OF BEING

More information about the Bio-soft mailing list