Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

Martel, Héctor; Richter, Julius; Li, Kai; Hu, Xiaolin; Gerkmann, Timo

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2306

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

Authors: Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann

(Submitted on 31 May 2023)

Abstract: We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an audio branch and a video branch, with iterative A-FRCNN blocks sharing weights for each modality. We evaluated our model in a controlled environment using the NTCD-TIMIT dataset and in-the-wild using a synthetic dataset that combines LRS3 and WHAM!. The experiments demonstrate the superiority of our model in both settings with respect to various audio-only and audio-visual baselines. Furthermore, the reduced footprint of our model makes it suitable for low resource applications.

Comments:	Accepted by Interspeech 2023
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2306.00160 [eess.AS]
	(or arXiv:2306.00160v1 [eess.AS] for this version)

Submission history

From: Héctor Martel [view email]
[v1] Wed, 31 May 2023 20:09:50 GMT (289kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2306.00160

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

Submission history