Zero-Shot Automatic Pronunciation Assessment

Liu, Hongfu; Shi, Mingqian; Wang, Ye

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2305

Computer Science > Sound

Title: Zero-Shot Automatic Pronunciation Assessment

Authors: Hongfu Liu, Mingqian Shi, Ye Wang

(Submitted on 31 May 2023)

Abstract: Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting them via a masking module. We then employ the Transformer encoder and apply k-means clustering to obtain token sequences. Finally, a scoring module is designed to measure the number of wrongly recovered tokens. Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines and outperforms non-regression baselines in terms of Pearson Correlation Coefficient (PCC). Additionally, we analyze how masking strategies affect the performance of APA.

Comments:	Accepted to Interspeech 2023
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.19563 [cs.SD]
	(or arXiv:2305.19563v1 [cs.SD] for this version)

Submission history

From: Hongfu Liu [view email]
[v1] Wed, 31 May 2023 05:17:17 GMT (2651kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.19563

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Zero-Shot Automatic Pronunciation Assessment

Submission history