Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Efficient keyword spotting using dilated convolutions and gating
(Submitted on 19 Nov 2018 (v1), last revised 18 Feb 2019 (this version, v2))
Abstract: We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset - "Hey Snips" utterances recorded by over 2.2K different speakers - has been made publicly available to establish an open reference for wake-word detection.
Submission history
From: Alice Coucke [view email][v1] Mon, 19 Nov 2018 13:51:10 GMT (329kb,D)
[v2] Mon, 18 Feb 2019 16:21:04 GMT (398kb,D)
Link back to: arXiv, form interface, contact.