We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

Abstract: We introduce the Expanded Groove MIDI dataset (E-GMD), an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations. We use E-GMD to optimize classifiers for use in downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite similar results on classification metrics. Via the listening tests, we argue that standard classifier metrics, such as accuracy and F-measure score, are insufficient proxies of performance in downstream tasks because they do not fully align with the perceptual quality of generated outputs.
Comments: Examples available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
Cite as: arXiv:2004.00188 [cs.SD]
  (or arXiv:2004.00188v5 [cs.SD] for this version)

Submission history

From: Curtis Hawthorne [view email]
[v1] Wed, 1 Apr 2020 01:24:42 GMT (229kb,D)
[v2] Wed, 8 Apr 2020 23:44:51 GMT (229kb,D)
[v3] Fri, 8 May 2020 12:03:43 GMT (229kb,D)
[v4] Wed, 27 May 2020 00:30:10 GMT (108kb,D)
[v5] Tue, 1 Dec 2020 18:11:04 GMT (88kb,D)

Link back to: arXiv, form interface, contact.