We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Data Structures and Algorithms

Title: Pattern Matching on Grammar-Compressed Strings in Linear Time

Abstract: The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory. However, the rapid increase in the amount of generated and stored data brings the need of designing algorithms that operate directly on compressed representations of data. In the compressed pattern matching problem we are given a compressed representation of the text, with $n$ being the length of the compressed representation and $N$ being the length of the text, and an uncompressed pattern of length $m$. The most challenging (and yet relevant when working with highly repetitive data, say biological information) scenario is when the chosen compression method is capable of describing a string of exponential length (in the size of its representation). An elegant formalism for such a compression method is that of straight-line programs, which are simply context-free grammars describing exactly one string. While it has been known that compressed pattern matching problem can be solved in $O(m+n\log N)$ time for this compression method, designing a linear-time algorithm remained open. We resolve this open question by presenting an $O(n+m)$ time algorithm that, given a context-free grammar of size $n$ that produces a single string $t$ and a pattern $p$ of length $m$, decides whether $p$ occurs in $t$ as a substring. To this end, we devise improved solutions for the weighted ancestor problem and the substring concatenation problem.
Subjects: Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2111.05016 [cs.DS]
  (or arXiv:2111.05016v1 [cs.DS] for this version)

Submission history

From: Moses Ganardi [view email]
[v1] Tue, 9 Nov 2021 09:24:52 GMT (23kb)

Link back to: arXiv, form interface, contact.