We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: On Locating Paths in Compressed Tries

Authors: Nicola Prezza
Abstract: In this paper, we consider the problem of compressing a trie while supporting the powerful \emph{locate} queries: to return the pre-order identifiers of all nodes reached by a path labeled with a given query pattern. Our result builds on top of the XBWT tree transform of Ferragina et al. [FOCS 2005] and generalizes the \emph{r-index} locate machinery of Gagie et al. [SODA 2018, JACM 2020] based on the run-length encoded Burrows-Wheeler transform (BWT). Our first contribution is to propose a suitable generalization of the run-length BWT to tries. We show that this natural generalization enjoys several of the useful properties of its counterpart on strings: in particular, the transform natively supports counting occurrences of a query pattern on the trie's paths and its size $r$ captures the trie's repetitiveness and lower-bounds a natural notion of trie entropy. Our main contribution is a much deeper insight into the combinatorial structure of this object. In detail, we show that a data structure of $O(r\log n) + 2n + o(n)$ bits, where $n$ is the number of nodes, allows locating the $occ$ occurrences of a pattern of length $m$ in nearly-optimal $O(m\log\sigma + occ)$ time, where $\sigma$ is the alphabet's size. Our solution consists in sampling $O(r)$ nodes that can be used as "anchor points" during the locate process. Once obtained the pre-order identifier of the first pattern occurrence (in co-lexicographic order), we show that a constant number of constant-time jumps between those anchor points lead to the identifier of the next pattern occurrence, thus enabling locating in optimal $O(1)$ time per occurrence.
Comments: Improved toehold lemma running time; added more detailed proofs that take care of all border cases in the locate strategy; postprint version to appear in SODA 2020
Subjects: Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2004.01120 [cs.DS]
  (or arXiv:2004.01120v4 [cs.DS] for this version)

Submission history

From: Nicola Prezza [view email]
[v1] Thu, 2 Apr 2020 16:43:21 GMT (124kb,D)
[v2] Mon, 6 Apr 2020 10:47:11 GMT (127kb,D)
[v3] Sat, 11 Apr 2020 08:45:01 GMT (99kb,D)
[v4] Wed, 16 Dec 2020 23:33:41 GMT (1360kb,D)

Link back to: arXiv, form interface, contact.