Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Choi, Woosung; Kim, Minseok; Chung, Jaehwa; Lee, Daewon; Jung, Soonyoung

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 1912

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Authors: Woosung Choi, Minseok Kim, Jaehwa Chung, Daewon Lee, Soonyoung Jung

(Submitted on 2 Dec 2019 (v1), last revised 8 Oct 2020 (this version, v3))

Abstract: Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.

Comments:	8 pages 4 tables 6 figures, accepted to ISMIR 2020
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:1912.02591 [eess.AS]
	(or arXiv:1912.02591v3 [eess.AS] for this version)

Submission history

From: Woosung Choi [view email]
[v1] Mon, 2 Dec 2019 07:46:19 GMT (1933kb,D)
[v2] Mon, 9 Dec 2019 13:56:59 GMT (1934kb,D)
[v3] Thu, 8 Oct 2020 16:39:49 GMT (1494kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1912.02591

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Submission history