Contextual Pattern Matching

Navarro, Gonzalo

Full-text links:

Download:

Current browse context:

cs.DS

< prev | next >

new | recent | 2010

Change to browse by:

Computer Science > Data Structures and Algorithms

Title: Contextual Pattern Matching

Authors: Gonzalo Navarro

(Submitted on 14 Oct 2020)

Abstract: The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all the positions where pattern $P$ appears" can produce huge outputs when $P$ appears in an area shared by many documents. All those occurrences are essentially the same.
In this paper we propose a new query that can be more appropriate in these collections, which we call {\em contextual pattern matching}. The basic query of this type gives, in addition to $P$, a context length $\ell$, and asks to report the occurrences of all {\em distinct} strings $XPY$, with $|X|=|Y|=\ell$.
While this query is easily solved in optimal time and linear space, we focus on using space related to the repetitiveness of the text collection and present the first solution of this kind. Letting $\ovr$ be the maximum of the number of runs in the BWT of the text $T[1..n]$ and of its reverse, our structure uses $O(\ovr\log(n/\ovr))$ space and finds the $c$ contextual occurrences $XPY$ of $(P,\ell)$ in time $O(|P| + c \log n)$. We give other space/time tradeoffs as well, for compressed and uncompressed indexes.

Comments:	Improvements and corrections over my SPIRE 2020 paper with the same title
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2010.07076 [cs.DS]
	(or arXiv:2010.07076v1 [cs.DS] for this version)

Submission history

From: Gonzalo Navarro [view email]
[v1] Wed, 14 Oct 2020 13:25:51 GMT (46kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.07076

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Data Structures and Algorithms

Title: Contextual Pattern Matching

Submission history