Current browse context:
eess.AS
Change to browse by:
References & Citations
Electrical Engineering and Systems Science > Audio and Speech Processing
Title: Investigating self-supervised front ends for speech spoofing countermeasures
(Submitted on 15 Nov 2021 (v1), last revised 4 Feb 2022 (this version, v3))
Abstract: Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.
Submission history
From: Xin Wang [view email][v1] Mon, 15 Nov 2021 12:52:50 GMT (331kb,D)
[v2] Sat, 20 Nov 2021 05:12:12 GMT (345kb,D)
[v3] Fri, 4 Feb 2022 13:25:23 GMT (781kb,D)
Link back to: arXiv, form interface, contact.