Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

Yin, Han; Bai, Jisheng; Wang, Mou; Huang, Siwei; Jia, Yafei; Chen, Jianfeng

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2306

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

Authors: Han Yin, Jisheng Bai, Mou Wang, Siwei Huang, Yafei Jia, Jianfeng Chen

(Submitted on 8 Jun 2023 (v1), last revised 20 Nov 2023 (this version, v4))

Abstract: 3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both the time and frequency domains. And an attention mechanism is proposed to fuse the original signal, reference signal, and generated masks. Moreover, we introduce a loss function to simultaneously optimize the network in the time-frequency and time domains. Experimental results show that our system outperforms the state-of-the-art systems on the dataset of ICASSP L3DAS23 challenge.

Comments:	Published on IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC 2023)
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2306.04987 [eess.AS]
	(or arXiv:2306.04987v4 [eess.AS] for this version)

Submission history

From: Han Yin [view email]
[v1] Thu, 8 Jun 2023 07:19:14 GMT (5311kb,D)
[v2] Tue, 4 Jul 2023 03:06:00 GMT (5476kb,D)
[v3] Sat, 30 Sep 2023 04:41:21 GMT (5476kb,D)
[v4] Mon, 20 Nov 2023 03:00:20 GMT (5476kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2306.04987

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

Submission history