The "runs" conjecture
by M. Crochemore, L. Ilie, and L. Tinta
A run is a maximal (non-extendable) repetition of exponent at least two. Here is an example: the string w=abbababbaba has a run [3..7] with period 2 and exponent 2.5, that is, w [3..7] = babab = (ba) 2.5.Other runs are [2..3], [7..8],[8..11], [5..10] and [1..11].The result of a two-decade effort in the stringology community to find an algorithm to compute all runs in a string in linear time resulted in the paper of [Kolpakov, Kucherov, FOCS'99] that
used previous techniques of Crochemore, Main, and Lorentz to construct an
algorithm that computes all runs in time proportional to the size of the output
(ii) proved that the maximum number of runs in a string of length n, RUNS(n), is linear, i.e., RUNS(n) <= cn , where c is a constant.
Therefore, the crucial contribution was (ii). However, they
could not provide any bound on the constant c.
The conjecture of Kolpakov and Kucherov, supported by computations, is that, for binary alphabets,
RUNS(n) <= n.
Known upper bounds
Current state of art
The idea used by [Crochemore and Ilie, JCSS, 2007] is to count the runs by their centers (the starting position of the second period) as well as count separately microruns (runs with short period).
Theorem. There is at most 1 center of a d-run on average in each interval of length d.
The Theorem is used as follows. If we put di = (p/ 2 )( 3 / 2 )i, i>=0, then the number of runs with period p or larger is at most n/ d 0 + n/ d 1 + n/ d 2 + .... Therefore
RUNS >=p (n) <= ( 6 /p)n.
RUNS <=p (n) <= bn.
Putting these two bounds together for p = 9 and b = 1, we obtained in [Crochemore and Ilie, JCSS, 2007] that RUNS(n) <= ( 6/10 )n + n = 1.6 n. For larger values of p , and/or smaller values of b , better bounds can be obtained but the computation can become very demanding.
In [Crochemore, Ilie, Tinta, CPM'08]
we gave an algorithm that reduces very much the complexity of this
computation and then implemented it on the SHARCNET high performance computer ( www.sharcnet.ca
). The results obtained are given below. (Note that they hold for
Current best bound:
Back to home page