References & Citations
Computer Science > Data Structures and Algorithms
Title: String Indexing for Top-$k$ Close Consecutive Occurrences
(Submitted on 8 Jul 2020 (v1), last revised 14 Feb 2024 (this version, v3))
Abstract: The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-$k$ close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair $(i,j)$, $i < j$, such that $P$ occurs at positions $i$ and $j$ in $S$ and there is no occurrence of $P$ between $i$ and $j$, and their distance is defined as $j-i$. Given a pattern $P$ and a parameter $k$, the goal is to report the top-$k$ consecutive occurrences of $P$ in $S$ of minimal distance. The challenge is to compactly represent $S$ while supporting queries in time close to the length of $P$ and $k$. We give three time-space trade-offs for the problem. Let $n$ be the length of $S$, $m$ the length of $P$, and $\epsilon\in(0,1]$. Our first result achieves $O(n\log n)$ space and optimal query time of $O(m+k)$. Our second and third results achieve linear space and query times either $O(m+k^{1+\epsilon})$ or $O(m + k \log^{1+\epsilon} n)$. Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.
Submission history
From: Teresa Anna Steiner [view email][v1] Wed, 8 Jul 2020 13:55:10 GMT (179kb,D)
[v2] Tue, 29 Sep 2020 09:18:53 GMT (236kb,D)
[v3] Wed, 14 Feb 2024 11:21:49 GMT (647kb,D)
Link back to: arXiv, form interface, contact.