We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Sensitivity of string compressors and repetitiveness measures

Abstract: The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string. In this paper, we analyze the worst-case multiplicative sensitivity of string compression algorithms, which is defined by $\max_{T \in \Sigma^n}\{C(T')/C(T) : ed(T, T') = 1\}$, where $ed(T, T')$ denotes the edit distance between $T$ and $T'$. For the most common versions of the Lempel-Ziv 77 compressors, we prove that the worst-case multiplicative sensitivity is upper bounded by a small constant, and give matching lower bounds. We generalize these results to the smallest bidirectional scheme $b$. In addition, we show that the sensitivity of a grammar-based compressor called GCIS is also a small constant. Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size $\gamma$ and the substring complexity $\delta$, and show that the worst-case sensitivity of $\delta$ is also a small constant. These results contrast with the previously known related results such that the size $z_{\rm 78}$ of the Lempel-Ziv 78 factorization can increase by a factor of $\Omega(n^{1/4})$ [Lagarde and Perifel, 2018], and the number $r$ of runs in the Burrows-Wheeler transform can increase by a factor of $\Omega(\log n)$ [Giuliani et al., 2021] when a character is prepended to an input string of length $n$. By applying our sensitivity bounds of $\delta$ or the smallest grammar to known results (c.f. [Navarro, 2021]), some non-trivial upper bounds for the sensitivities of important string compressors and repetitiveness measures including $\gamma$, $r$, LZ-End, RePair, LongestMatch, and AVL-grammar are derived.
Comments: The journal version is superceded by this version (we added almost tight bounds for Bisection with insertions and deletions)
Subjects: Data Structures and Algorithms (cs.DS)
Journal reference: Information and Computation, Volume 291, March 2023, 104999
DOI: 10.1016/j.ic.2022.104999
Cite as: arXiv:2107.08615 [cs.DS]
  (or arXiv:2107.08615v6 [cs.DS] for this version)

Submission history

From: Shunsuke Inenaga [view email]
[v1] Mon, 19 Jul 2021 05:23:30 GMT (238kb,D)
[v2] Sun, 26 Dec 2021 13:01:02 GMT (327kb,D)
[v3] Mon, 3 Jan 2022 12:46:16 GMT (331kb,D)
[v4] Sun, 6 Nov 2022 06:39:45 GMT (473kb,D)
[v5] Wed, 4 Jan 2023 01:52:15 GMT (473kb,D)
[v6] Thu, 9 Feb 2023 12:18:47 GMT (489kb,D)

Link back to: arXiv, form interface, contact.