Real-time Speech Frequency Bandwidth Extension

Li, Yunpeng; Tagliasacchi, Marco; Rybakov, Oleg; Ungureanu, Victor; Roblek, Dominik

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2010

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Real-time Speech Frequency Bandwidth Extension

Authors: Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu, Dominik Roblek

(Submitted on 21 Oct 2020 (v1), last revised 9 Feb 2021 (this version, v2))

Abstract: In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2010.10677 [eess.AS]
	(or arXiv:2010.10677v2 [eess.AS] for this version)

Submission history

From: Yunpeng Li [view email]
[v1] Wed, 21 Oct 2020 00:01:19 GMT (440kb,D)
[v2] Tue, 9 Feb 2021 12:52:17 GMT (440kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2010.10677

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Real-time Speech Frequency Bandwidth Extension

Submission history