**The "runs" conjecture****
**by M.
Crochemore, L. Ilie, and L. Tinta

**The conjecture **

A *run
*is a maximal (non-extendable) repetition of exponent at least two. Here is
an example: the string *w=abbababbaba *has a run [3..7] with *period *2
and *exponent *2.5, that is, *w *[3..7] *= babab *= *(ba) *^{2.5}.Other runs are [2..3],
[7..8],[8..11], [5..10] and [1..11].The result of a two-decade effort in the
stringology community to find an algorithm to compute all runs in a string in
linear time resulted in the paper of [Kolpakov, Kucherov, FOCS'99] that

(i)
used previous techniques of Crochemore, Main, and Lorentz to construct an
algorithm that computes all runs in time proportional to the size of the output
and

(ii) proved that the maximum number of runs in a string of length *n,
RUNS(n), *is linear, i.e., *RUNS(n) <= cn *, where *c *is a
constant.

Therefore, the crucial contribution was (ii). However, they
could not provide any bound on the constant *c. *

*The conjecture
of Kolpakov and Kucherov, supported by
computations, is that, for binary alphabets, *

*RUNS(n) <= n. *

**Known
upper bounds **

- Rytter,
STACS'06:
*RUNS(n) <=***5***n* - Rytter, Inf. and Comput.,
2007:
*RUNS(n) <=***3.44***n* - Puglisi, Simpson, and Smyth, submitted
2007:
*RUNS(n) <=***3.48***n* - Crochemore and Ilie, JCSS, 2007:
*RUNS(n) <=***1.6***n*

**Current state of art **

The idea used by [Crochemore and Ilie, JCSS, 2007] is to count the runs by
their centers (the starting position of the second period) as well as count
separately *microruns *(runs with short period)*. *

The following theorem is used to bound the number of runs with period*Macroruns.**p*or larger. A*d*-run is a run with period between 2*d*and 3*d.*

* *

*Theorem. **There is at most *1 *center of
a d-run on average in each interval of length d. *

The Theorem is used as follows. If we put *d*_{i}* = (p/ *2 *)( *3 */ *2 *)*^{i}*, i>=0, *then the number of runs with
period *p *or larger is at most *n/ d *_{0} *+ _{ }n/ d *

*RUNS *_{>=p}* (n)
<= ( *

The runs with short periods are counted as follows. For a given bound*Microruns.**b*and maximum period*p,*the runs are non-uniformly distributed and we try to amortize them. That is, given a position*i,*we try all possible combinations of periods for the runs with centers at the left of*i,*until a position*i-j*is found such that the ratio between the number of centers inside the interval [*i-j..i*]and its length,*j+1*, falls below*b.*Also, at any moment, the number of centers of runs that have both the center and beginning inside the interval should satisfy the corresponding amortizing condition. If successful, this procedure proves that

*RUNS *_{<=p}* (n)
<= bn. *

Putting
these two bounds together for *p = *9 and *b =
*1, we obtained in [Crochemore and Ilie, JCSS, 2007] that *RUNS(n) <= ( *6/10
*)n + n = *1.6 *n. *For larger values of *p *, and/or smaller
values of *b *, better bounds can be obtained but the computation can become very
demanding.

In [Crochemore, Ilie, Tinta, CPM'08]
we gave an algorithm that reduces very much the complexity of this
computation and then implemented it on the SHARCNET high performance computer ( www.sharcnet.ca
). The results obtained are given below. (Note that they hold for
arbitrary alphabets.)

*Current best bound: *

*p=*10*, b=*0.85*: RUNS(n) <=***1.396***n**p=*15*, b=*0.89*: RUNS(n) <=***1.265***n**p=*20*, b=*0.89*: RUNS(n) <=***1.176***n**p=*25*, b=*0.91*: RUNS(n) <=***1.141***n**p=*30*, b=*0.91*: RUNS(n) <=***1.104n***p=*35*, b=*0.93*: RUNS(n) <=***1.097***n**p=*40*, b=*0.93*: RUNS(n) <=***1.077***n**p=*50*, b=*0.93*: RUNS(n) <=***1.048***n**p=*60*, b=*0.93*: RUNS(n) <=***1.029***n*

**Lower
bounds **

- Franek, Simpson, and Smyth,
2003:
*RUNS(n) >=***0.927..***.n* - K. Kusano, W. Matsubara, A. Ishino, H.
Bannai, A. Shinohara, 2008:
*RUNS(n) >=***0.944565n**(see http://www.shino.ecei.tohoku.ac.jp/runs/)