We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Enhancing Unsupervised Speech Recognition with Diffusion GANs

Authors: Xianchao Wu
Abstract: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusion-GAN. Our model (1) injects instance noises of various intensities to the generator's output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint, (2) asks diffusion timestep-dependent discriminators to separate them, and (3) back-propagates the gradients to update the generator. Word/phoneme error rate comparisons with wav2vec-U under Librispeech (3.1% for test-clean and 5.6% for test-other), TIMIT and MLS datasets, show that our enhancement strategies work effectively.
Comments: 5 pages, 1 figure, accepted by ICASSP 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2303.13559 [cs.CL]
  (or arXiv:2303.13559v1 [cs.CL] for this version)

Submission history

From: Xianchao Wu [view email]
[v1] Thu, 23 Mar 2023 02:54:00 GMT (394kb,D)

Link back to: arXiv, form interface, contact.