We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory

Abstract: Audio alignment is a fundamental preprocessing step in many MIR pipelines. For two audio clips with M and N frames, respectively, the most popular approach, dynamic time warping (DTW), has O(MN) requirements in both memory and computation, which is prohibitive for frame-level alignments at reasonable rates. To address this, a variety of memory efficient algorithms exist to approximate the optimal alignment under the DTW cost. To our knowledge, however, no exact algorithms exist that are guaranteed to break the quadratic memory barrier. In this work, we present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory. Its runtime is still O(MN), trading off memory for a 2x increase in computation. However, the algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU. We use our algorithm to compute exact alignments on a collection of orchestral music, which we use as ground truth to benchmark the alignment accuracy of several popular approximate alignment schemes at scales that were not previously possible.
Comments: 12 Pages, 6 Figures, 1 Table, ISMIR 2020
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
ACM classes: H.5.5; H.3.3; F.2.1
Cite as: arXiv:2008.02734 [cs.SD]
  (or arXiv:2008.02734v1 [cs.SD] for this version)

Submission history

From: Christopher Tralie [view email]
[v1] Tue, 4 Aug 2020 15:00:33 GMT (861kb,D)

Link back to: arXiv, form interface, contact.