Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Soboleva, Daria; Skopek, Ondrej; Šajgalík, Márius; Cărbune, Victor; Weissenberger, Felix; Proskurnia, Julia; Prisacari, Bogdan; Valcarce, Daniel; Lu, Justin; Prabhavalkar, Rohit; Miklos, Balint

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2010

Computer Science > Machine Learning

Title: Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Authors: Daria Soboleva, Ondrej Skopek, Márius Šajgalík, Victor Cărbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos

(Submitted on 20 Oct 2020 (v1), last revised 10 Feb 2021 (this version, v2))

Abstract: We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, keeping the model size small and latency low.

Comments:	Accepted to IEEE ICASSP 2021
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2010.10203 [cs.LG]
	(or arXiv:2010.10203v2 [cs.LG] for this version)

Submission history

From: Ondrej Skopek [view email]
[v1] Tue, 20 Oct 2020 11:30:26 GMT (67kb,D)
[v2] Wed, 10 Feb 2021 21:01:35 GMT (68kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.10203

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Submission history