Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Tóth, László; Shandiz, Amin Honarmandi; Gosztolya, Gábor; Gábor, Csapó Tamás

doi:10.21437/Interspeech.2023-1607

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2305

Computer Science > Sound

Title: Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Authors: László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Csapó Tamás Gábor

(Submitted on 30 May 2023 (v1), last revised 17 Oct 2023 (this version, v3))

Abstract: Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting the recording equipment. To aid quick speaker and session adaptation of ultrasound tongue imaging-based SSI models, we extend our deep networks with a spatial transformer network (STN) module, capable of performing an affine transformation on the input images. Although the STN part takes up only about 10% of the network, our experiments show that adapting just the STN module might allow to reduce MSE by 88% on the average, compared to retraining the whole network. The improvement is even larger (around 92%) when adapting the network to different recording sessions from the same speaker.

Comments:	5 pages, 3 figures, 3 tables
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Journal reference:	the Proceedings of Interspeech 2023
DOI:	10.21437/Interspeech.2023-1607
Cite as:	arXiv:2305.19130 [cs.SD]
	(or arXiv:2305.19130v3 [cs.SD] for this version)

Submission history

From: Amin Honarmandi Shandiz [view email]
[v1] Tue, 30 May 2023 15:41:47 GMT (610kb)
[v2] Wed, 31 May 2023 07:51:32 GMT (610kb)
[v3] Tue, 17 Oct 2023 08:01:34 GMT (610kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.19130

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Sound

Title: Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Submission history