The "runs" conjecture
by M.
Crochemore, L. Ilie, and L. Tinta
The conjecture
A run
is a maximal (non-extendable) repetition of exponent at least two. Here is
an example: the string w=abbababbaba has a run [3..7] with period 2
and exponent 2.5, that is, w [3..7] = babab = (ba) 2.5.Other runs are [2..3],
[7..8],[8..11], [5..10] and [1..11].The result of a two-decade effort in the
stringology community to find an algorithm to compute all runs in a string in
linear time resulted in the paper of [Kolpakov, Kucherov, FOCS'99] that
(i)
used previous techniques of Crochemore, Main, and Lorentz to construct an
algorithm that computes all runs in time proportional to the size of the output
and
(ii) proved that the maximum number of runs in a string of length n,
RUNS(n), is linear, i.e., RUNS(n) <= cn , where c is a
constant.
Therefore, the crucial contribution was (ii). However, they
could not provide any bound on the constant c.
The conjecture
of Kolpakov and Kucherov, supported by
computations, is that, for binary alphabets,
RUNS(n) <= n.
Known
upper bounds
Current state of art
The idea used by [Crochemore and Ilie, JCSS, 2007] is to count the runs by
their centers (the starting position of the second period) as well as count
separately microruns (runs with short period).
Theorem. There is at most 1 center of
a d-run on average in each interval of length d.
The Theorem is used as follows. If we put di = (p/ 2 )( 3 / 2 )i, i>=0, then the number of runs with
period p or larger is at most n/ d 0 + n/ d 1 + n/ d 2 + .... Therefore
RUNS >=p (n)
<= ( 6 /p)n.
RUNS <=p (n)
<= bn.
Putting these two bounds together for p = 9 and b = 1, we obtained in [Crochemore and Ilie, JCSS, 2007] that RUNS(n) <= ( 6/10 )n + n = 1.6 n. For larger values of p , and/or smaller values of b , better bounds can be obtained but the computation can become very demanding.
In [Crochemore, Ilie, Tinta, CPM'08]
we gave an algorithm that reduces very much the complexity of this
computation and then implemented it on the SHARCNET high performance computer ( www.sharcnet.ca
). The results obtained are given below. (Note that they hold for
arbitrary alphabets.)
Current best bound: