Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Ke, Dengfeng; Zhang, Jinsong; Xie, Yanlu; Xu, Yanyan; Lin, Binghuai

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2105

Computer Science > Sound

Title: Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Authors: Dengfeng Ke, Jinsong Zhang, Yanlu Xie, Yanyan Xu, Binghuai Lin

(Submitted on 6 May 2021)

Abstract: Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with ReLU. Finally, BLSTM in PHASEN is replaced with Conv2d operation and the phase stream is simplified. With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4.30, the PESQ score of 3.07 and the COVL score of 3.73.

Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2105.02509 [cs.SD]
	(or arXiv:2105.02509v1 [cs.SD] for this version)

Submission history

From: Dengfeng Ke [view email]
[v1] Thu, 6 May 2021 08:18:02 GMT (244kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2105.02509

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Submission history