We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

Abstract: In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS technology being embedded in a number of technologies such as voice assistants. Many of these models rely on large amounts of labelled data to achieve good performance. As a result, their use is restricted to applications for which a large labelled speech data set can be obtained. Self-supervised learning seeks to mitigate the need for large labelled data sets by leveraging unlabelled data, which is easier to obtain in large amounts. However, most self-supervised methods have only been investigated for very large models, whereas KWS models are desired to be small. In this paper, we investigate the use of self-supervised pretraining for the smaller KWS models in a label-deficient scenario. We pretrain the Keyword Transformer model using the self-supervised framework Data2Vec and carry out experiments on a label-deficient setup of the Google Speech Commands data set. It is found that the pretrained models greatly outperform the models without pretraining, showing that Data2Vec pretraining can increase the performance of KWS models in label-deficient scenarios. The source code is made publicly available.
Comments: 8 pages, 3 figures, 4 tables
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
MSC classes: 68T10
ACM classes: I.2.6
Cite as: arXiv:2210.01703 [cs.SD]
  (or arXiv:2210.01703v2 [cs.SD] for this version)

Submission history

From: Holger Severin Bovbjerg [view email]
[v1] Tue, 4 Oct 2022 15:56:27 GMT (63kb,D)
[v2] Fri, 9 Dec 2022 13:31:06 GMT (283kb,D)
[v3] Wed, 24 May 2023 12:17:31 GMT (88kb,D)

Link back to: arXiv, form interface, contact.