We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Parameterized DAWGs: efficient constructions and bidirectional pattern searches

Abstract: Two strings $x$ and $y$ over $\Sigma \cup \Pi$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:\Sigma \cup \Pi \rightarrow \Sigma \cup \Pi$ that is identity on $\Sigma$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata} (\emph{p-suffix automata}) and \emph{parameterized directed acyclic word graphs} (\emph{PDAWGs}) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have $\Theta(n^2)$ nodes and edges but PDAWGs have only $O(n)$ nodes and edges, where $n$ is the length of an input string. We also give an $O(n |\Pi| \log (|\Pi| + |\Sigma|))$-time $O(n)$-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the \emph{parameterized suffix tree} for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal of the input string $T$, one can build the PDAWG of $T$ in $O(n)$ time in an offline manner; One can perform \emph{bidirectional} p-matching in $O(m \log (|\Pi|+|\Sigma|) + \mathit{occ})$ time using $O(n)$ space, where $m$ denotes the pattern length and $\mathit{occ}$ is the number of pattern occurrences in the text $T$.
Comments: 28 pages, 7 figures
Subjects: Data Structures and Algorithms (cs.DS)
Journal reference: Theoretical Computer Science (2022)
DOI: 10.1016/j.tcs.2022.09.008
Cite as: arXiv:2002.06786 [cs.DS]
  (or arXiv:2002.06786v4 [cs.DS] for this version)

Submission history

From: Diptarama Hendrian [view email]
[v1] Mon, 17 Feb 2020 06:08:01 GMT (1697kb,D)
[v2] Sat, 6 Jun 2020 04:32:51 GMT (1882kb,D)
[v3] Sat, 10 Sep 2022 02:27:48 GMT (165kb,D)
[v4] Fri, 16 Sep 2022 04:48:54 GMT (165kb,D)

Link back to: arXiv, form interface, contact.