We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Merging Sorted Lists of Similar Strings

Authors: Gene Myers
Abstract: Merging $T$ sorted, non-redundant lists containing $M$ elements into a single sorted, non-redundant result of size $N \ge M/T$ is a classic problem typically solved practically in $O(M \log T)$ time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an $O(M \log T+S)$ worst-case method that performs no more character comparisons than the sum of the lengths of all the strings $S$, and another $O(M \log (T/ \bar e)+S)$ method that becomes progressively more efficient as a function of the fraction of equal elements $\bar e = M/N$ between input lists, reaching linear time when the lists are all identical. The methods perform favorably in practice versus an alternate formulation based on a trie.
Comments: 13 pages. Associated code at this https URL
Subjects: Data Structures and Algorithms (cs.DS)
MSC classes: 68W40 (Primary) 68P05, 68P10, 68W05, 68W32 (Secondary)
ACM classes: E.1; F.2.2
Cite as: arXiv:2208.09351 [cs.DS]
  (or arXiv:2208.09351v1 [cs.DS] for this version)

Submission history

From: Eugene Myers [view email]
[v1] Fri, 19 Aug 2022 14:02:53 GMT (18kb)

Link back to: arXiv, form interface, contact.