End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Watcharasupat, Karn N.; Nguyen, Thi Ngoc Tho; Gan, Woon-Seng; Zhao, Shengkui; Ma, Bin

doi:10.1109/ICASSP43922.2022.9747034

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Authors: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

(Submitted on 2 Oct 2021 (v1), last revised 22 Jan 2022 (this version, v3))

Abstract: Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.

Comments:	To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
Journal reference:	Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660
DOI:	10.1109/ICASSP43922.2022.9747034
Cite as:	arXiv:2110.00745 [eess.AS]
	(or arXiv:2110.00745v3 [eess.AS] for this version)

Submission history

From: Karn N Watcharasupat [view email]
[v1] Sat, 2 Oct 2021 07:41:41 GMT (385kb,D)
[v2] Mon, 11 Oct 2021 20:03:32 GMT (385kb,D)
[v3] Sat, 22 Jan 2022 11:50:43 GMT (385kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.00745

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Submission history