Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

Bovbjerg, Holger Severin; Tan, Zheng-Hua

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2210

Computer Science > Sound

Title: Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

Authors: Holger Severin Bovbjerg, Zheng-Hua Tan

(Submitted on 4 Oct 2022 (v1), revised 9 Dec 2022 (this version, v2), latest version 24 May 2023 (v3))

Abstract: In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS technology being embedded in a number of technologies such as voice assistants. Many of these models rely on large amounts of labelled data to achieve good performance. As a result, their use is restricted to applications for which a large labelled speech data set can be obtained. Self-supervised learning seeks to mitigate the need for large labelled data sets by leveraging unlabelled data, which is easier to obtain in large amounts. However, most self-supervised methods have only been investigated for very large models, whereas KWS models are desired to be small. In this paper, we investigate the use of self-supervised pretraining for the smaller KWS models in a label-deficient scenario. We pretrain the Keyword Transformer model using the self-supervised framework Data2Vec and carry out experiments on a label-deficient setup of the Google Speech Commands data set. It is found that the pretrained models greatly outperform the models without pretraining, showing that Data2Vec pretraining can increase the performance of KWS models in label-deficient scenarios. The source code is made publicly available.

Comments:	8 pages, 3 figures, 4 tables
Subjects:	Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
MSC classes:	68T10
ACM classes:	I.2.6
Cite as:	arXiv:2210.01703 [cs.SD]
	(or arXiv:2210.01703v2 [cs.SD] for this version)

Submission history

From: Holger Severin Bovbjerg [view email]
[v1] Tue, 4 Oct 2022 15:56:27 GMT (63kb,D)
[v2] Fri, 9 Dec 2022 13:31:06 GMT (283kb,D)
[v3] Wed, 24 May 2023 12:17:31 GMT (88kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2210.01703v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

Submission history