Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework

Kohler, Jonas; Ottenhoff, Maarten C.; Goulis, Sophocles; Angrick, Miguel; Colon, Albert J.; Wagner, Louis; Tousseyn, Simon; Kubben, Pieter L.; Herff, Christian

doi:10.51628/001c.57524

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2111

Computer Science > Sound

Title: Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework

Authors: Jonas Kohler, Maarten C. Ottenhoff, Sophocles Goulis, Miguel Angrick, Albert J. Colon, Louis Wagner, Simon Tousseyn, Pieter L. Kubben, Christian Herff

(Submitted on 2 Nov 2021 (v1), last revised 31 Oct 2022 (this version, v4))

Abstract: Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data. In particular, the architecture we employ naturally picks up on the temporal nature of the data and thereby outperforms an existing benchmark based on non-regressive convolutional neural networks.

Subjects:	Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
DOI:	10.51628/001c.57524
Cite as:	arXiv:2111.01457 [cs.SD]
	(or arXiv:2111.01457v4 [cs.SD] for this version)

Submission history

From: Jonas Kohler [view email]
[v1] Tue, 2 Nov 2021 09:43:21 GMT (13973kb,D)
[v2] Tue, 5 Jul 2022 11:18:11 GMT (14482kb,D)
[v3] Fri, 2 Sep 2022 15:00:56 GMT (14665kb,D)
[v4] Mon, 31 Oct 2022 13:15:13 GMT (14924kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.01457

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework

Submission history