Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Liu, Da-rong; Hsu, Po-chun; Chen, Yi-chen; Huang, Sung-feng; Chuang, Shun-po; Wu, Da-yi; Lee, Hung-yi

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2207

Computer Science > Sound

Title: Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Authors: Da-rong Liu, Po-chun Hsu, Yi-chen Chen, Sung-feng Huang, Shun-po Chuang, Da-yi Wu, Hung-yi Lee

(Submitted on 29 Jul 2022)

Abstract: ASR has been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone sequences and speech utterances. We design a two-stage iterative framework. GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence. In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance and provides a better segmentation for the next iteration. In the experiment, we first investigate different choices of model designs. Then we compare the framework to different types of baselines: (i) supervised methods (ii) acoustic unit discovery based methods (iii) methods learning from unpaired data. Our framework performs consistently better than all acoustic unit discovery methods and previous methods learning from unpaired data based on the TIMIT dataset.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2207.14568 [cs.SD]
	(or arXiv:2207.14568v1 [cs.SD] for this version)

Submission history

From: Da-Rong Liu [view email]
[v1] Fri, 29 Jul 2022 09:29:28 GMT (3848kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2207.14568

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Submission history