References & Citations
Computer Science > Data Structures and Algorithms
Title: On Locating Paths in Compressed Tries
(Submitted on 2 Apr 2020 (v1), last revised 16 Dec 2020 (this version, v4))
Abstract: In this paper, we consider the problem of compressing a trie while supporting the powerful \emph{locate} queries: to return the pre-order identifiers of all nodes reached by a path labeled with a given query pattern. Our result builds on top of the XBWT tree transform of Ferragina et al. [FOCS 2005] and generalizes the \emph{r-index} locate machinery of Gagie et al. [SODA 2018, JACM 2020] based on the run-length encoded Burrows-Wheeler transform (BWT). Our first contribution is to propose a suitable generalization of the run-length BWT to tries. We show that this natural generalization enjoys several of the useful properties of its counterpart on strings: in particular, the transform natively supports counting occurrences of a query pattern on the trie's paths and its size $r$ captures the trie's repetitiveness and lower-bounds a natural notion of trie entropy. Our main contribution is a much deeper insight into the combinatorial structure of this object. In detail, we show that a data structure of $O(r\log n) + 2n + o(n)$ bits, where $n$ is the number of nodes, allows locating the $occ$ occurrences of a pattern of length $m$ in nearly-optimal $O(m\log\sigma + occ)$ time, where $\sigma$ is the alphabet's size. Our solution consists in sampling $O(r)$ nodes that can be used as "anchor points" during the locate process. Once obtained the pre-order identifier of the first pattern occurrence (in co-lexicographic order), we show that a constant number of constant-time jumps between those anchor points lead to the identifier of the next pattern occurrence, thus enabling locating in optimal $O(1)$ time per occurrence.
Submission history
From: Nicola Prezza [view email][v1] Thu, 2 Apr 2020 16:43:21 GMT (124kb,D)
[v2] Mon, 6 Apr 2020 10:47:11 GMT (127kb,D)
[v3] Sat, 11 Apr 2020 08:45:01 GMT (99kb,D)
[v4] Wed, 16 Dec 2020 23:33:41 GMT (1360kb,D)
Link back to: arXiv, form interface, contact.