Current browse context:
cs.SD
Change to browse by:
References & Citations
Computer Science > Sound
Title: Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection
(Submitted on 1 Nov 2016 (v1), last revised 20 Apr 2017 (this version, v3))
Abstract: In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain) speech detection algorithms in various noisy environments.
Submission history
From: Pengfei Sun [view email][v1] Tue, 1 Nov 2016 18:38:12 GMT (2371kb,D)
[v2] Fri, 27 Jan 2017 06:01:20 GMT (2810kb,D)
[v3] Thu, 20 Apr 2017 18:43:29 GMT (2810kb,D)
Link back to: arXiv, form interface, contact.