We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

Abstract: Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times.
Comments: The 28th European Signal Processing Conference (EUSIPCO)
Subjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:1911.04824 [cs.IR]
  (or arXiv:1911.04824v3 [cs.IR] for this version)

Submission history

From: Andres Ferraro [view email]
[v1] Tue, 12 Nov 2019 12:50:10 GMT (372kb,D)
[v2] Mon, 2 Mar 2020 18:53:02 GMT (372kb,D)
[v3] Sun, 28 Jun 2020 10:13:16 GMT (1001kb,D)

Link back to: arXiv, form interface, contact.