Hybrid Spectrogram and Waveform Source Separation

Défossez, Alexandre

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2111

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Hybrid Spectrogram and Waveform Source Separation

Authors: Alexandre Défossez

(Submitted on 5 Nov 2021 (v1), last revised 30 Aug 2022 (this version, v3))

Abstract: Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted at the competition).

Comments:	ISMIR 2021 MDX Workshop, 11 pages, 2 figures
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:2111.03600 [eess.AS]
	(or arXiv:2111.03600v3 [eess.AS] for this version)

Submission history

From: Alexandre Defossez [view email]
[v1] Fri, 5 Nov 2021 16:37:45 GMT (802kb,D)
[v2] Mon, 29 Aug 2022 13:26:58 GMT (804kb,D)
[v3] Tue, 30 Aug 2022 16:07:25 GMT (804kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2111.03600

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Hybrid Spectrogram and Waveform Source Separation

Submission history