WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Zhang, Zixing; Farnsworth, Thorin; Lin, Senling; Karout, Salah

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2210

Computer Science > Sound

Title: WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Authors: Zixing Zhang, Thorin Farnsworth, Senling Lin, Salah Karout

(Submitted on 6 Oct 2022)

Abstract: End-to-end models have gradually become the main technical stream for voice trigger, aiming to achieve an utmost prediction accuracy but with a small footprint. In present paper, we propose an end-to-end voice trigger framework, namely WakeupNet, which is basically structured on a Transformer encoder. The purpose of this framework is to explore the context-capturing capability of Transformer, as sequential information is vital for wakeup-word detection. However, the conventional Transformer encoder is too large to fit our task. To address this issue, we introduce different model compression approaches to shrink the vanilla one into a tiny one, called mobile-Transformer. To evaluate the performance of mobile-Transformer, we conduct extensive experiments on a large public-available dataset HiMia. The obtained results indicate that introduced mobile-Transformer significantly outperforms other frequently used models for voice trigger in both clean and noisy scenarios.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.02904 [cs.SD]
	(or arXiv:2210.02904v1 [cs.SD] for this version)

Submission history

From: Zixing Zhang [view email]
[v1] Thu, 6 Oct 2022 13:18:48 GMT (149kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2210.02904

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Submission history