The "runs" conjecture
by M. Crochemore, L. Ilie, and L. Tinta

  The conjecture

A run is a maximal (non-extendable) repetition of exponent at least two. Here is an example: the string w=abbababbaba has a run [3..7] with period 2 and exponent 2.5, that is, w [3..7] = babab = (ba) 2.5.Other runs are [2..3], [7..8],[8..11], [5..10] and [1..11].The result of a two-decade effort in the stringology community to find an algorithm to compute all runs in a string in linear time resulted in the paper of [Kolpakov, Kucherov, FOCS'99] that

(i) used previous techniques of Crochemore, Main, and Lorentz to construct an algorithm that computes all runs in time proportional to the size of the output and
(ii) proved that the maximum number of runs in a string of length n, RUNS(n), is linear, i.e., RUNS(n) <= cn , where c is a constant.

Therefore, the crucial contribution was (ii). However, they could not provide any bound on the constant c.

The conjecture of Kolpakov and Kucherov, supported by computations, is that, for binary alphabets,

RUNS(n) <= n.

  Known upper bounds

  Current state of art

The idea used by [Crochemore and Ilie, JCSS, 2007] is to count the runs by their centers (the starting position of the second period) as well as count separately microruns (runs with short period).


Theorem. There is at most 1 center of a d-run on average in each interval of length d.


The Theorem is used as follows. If we put di = (p/ 2 )( 3 / 2 )i, i>=0, then the number of runs with period p or larger is at most n/ d 0 + n/ d 1 + n/ d 2 + .... Therefore

RUNS >=p (n) <= ( 6 /p)n.


RUNS <=p (n) <= bn.

Putting these two bounds together for p =  9 and b = 1, we obtained in [Crochemore and Ilie, JCSS, 2007] that RUNS(n) <= ( 6/10 )n + n = 1.6 n. For larger values of p , and/or smaller values of b , better bounds can be obtained but the computation can become very demanding.

In [Crochemore, Ilie, Tinta, CPM'08] we gave an algorithm that reduces very much the complexity of this computation and then implemented it on the SHARCNET high performance computer ( ). The results obtained are given below. (Note that they hold for arbitrary alphabets.)

Current best bound:

  Lower bounds

Back to home page