Learnable Spectro-temporal Receptive Fields for Robust Voice Type Discrimination

Vuong, Tyler; Xia, Yangyang; Stern, Richard

doi:10.21437/Interspeech.2020-1878

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2010

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Learnable Spectro-temporal Receptive Fields for Robust Voice Type Discrimination

Authors: Tyler Vuong, Yangyang Xia, Richard Stern

(Submitted on 19 Oct 2020)

Abstract: Voice Type Discrimination (VTD) refers to discrimination between regions in a recording where speech was produced by speakers that are physically within proximity of the recording device ("Live Speech") from speech and other types of audio that were played back such as traffic noise and television broadcasts ("Distractor Audio"). In this work, we propose a deep-learning-based VTD system that features an initial layer of learnable spectro-temporal receptive fields (STRFs). Our approach is also shown to provide very strong performance on a similar spoofing detection task in the ASVspoof 2019 challenge. We evaluate our approach on a new standardized VTD database that was collected to support research in this area. In particular, we study the effect of using learnable STRFs compared to static STRFs or unconstrained kernels. We also show that our system consistently improves a competitive baseline system across a wide range of signal-to-noise ratios on spoofing detection in the presence of VTD distractor noise.

Comments:	Accepted Interspeech 2020. Video: this http URL&c=index&a=show&catid=311&id=712
Subjects:	Audio and Speech Processing (eess.AS)
DOI:	10.21437/Interspeech.2020-1878
Cite as:	arXiv:2010.09151 [eess.AS]
	(or arXiv:2010.09151v1 [eess.AS] for this version)

Submission history

From: Tyler Vuong [view email]
[v1] Mon, 19 Oct 2020 00:29:02 GMT (5454kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2010.09151

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Learnable Spectro-temporal Receptive Fields for Robust Voice Type Discrimination

Submission history