AmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using Transformer Networks

Herzog, Adrian; Chetupalli, Srikanth Raj; Habets, Emanuël A. P.

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2206

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: AmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using Transformer Networks

Authors: Adrian Herzog, Srikanth Raj Chetupalli, Emanuël A. P. Habets

(Submitted on 13 Jun 2022)

Abstract: Consider a multichannel Ambisonic recording containing a mixture of several reverberant speech signals. Retreiving the reverberant Ambisonic signals corresponding to the individual speech sources blindly from the mixture is a challenging task as it requires to estimate multiple signal channels for each source. In this work, we propose AmbiSep, a deep neural network-based plane-wave domain masking approach to solve this task. The masking network uses learned feature representations and transformers in a triple-path processing configuration. We train and evaluate the proposed network architecture on a spatialized WSJ0-2mix dataset, and show that the method achieves a multichannel scale-invariant signal-to-distortion ratio improvement of 17.7 dB on the blind test set, while preserving the spatial characteristics of the separated sounds.

Comments:	Preprint submitted to IWAENC 2022 (this https URL)
Subjects:	Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2206.06184 [eess.AS]
	(or arXiv:2206.06184v1 [eess.AS] for this version)

Submission history

From: Adrian Herzog [view email]
[v1] Mon, 13 Jun 2022 14:18:17 GMT (267kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2206.06184

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: AmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using Transformer Networks

Submission history