We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Towards speech enhancement using a variational U-Net architecture

Abstract: We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectro-temporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net architecture over its classic, non-variational version in signal enhancement performance under reverberant conditions of 0.31 and 6.98 in PESQ and STOI scores, respectively. Anecdotal evidence points to improved suppression of impulsive noise sources with the variational U-Net compared to the recurrent mask estimation network baseline.
Comments: Submitted to EUSIPCO 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as: arXiv:2012.03594 [eess.AS]
  (or arXiv:2012.03594v2 [eess.AS] for this version)

Submission history

From: Eike Nustede [view email]
[v1] Mon, 7 Dec 2020 11:30:35 GMT (3390kb,D)
[v2] Wed, 3 Mar 2021 09:56:32 GMT (2190kb,D)

Link back to: arXiv, form interface, contact.