We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Abstract: Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.
Comments: 8 pages 4 tables 6 figures, accepted to ISMIR 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
Cite as: arXiv:1912.02591 [eess.AS]
  (or arXiv:1912.02591v3 [eess.AS] for this version)

Submission history

From: Woosung Choi [view email]
[v1] Mon, 2 Dec 2019 07:46:19 GMT (1933kb,D)
[v2] Mon, 9 Dec 2019 13:56:59 GMT (1934kb,D)
[v3] Thu, 8 Oct 2020 16:39:49 GMT (1494kb,D)

Link back to: arXiv, form interface, contact.