A Comparison Study on Infant-Parent Voice Diarization

Zhu, Junzhe; Hasegawa-Johnson, Mark; McElwain, Nancy

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2011

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Comparison Study on Infant-Parent Voice Diarization

Authors: Junzhe Zhu, Mark Hasegawa-Johnson, Nancy McElwain

(Submitted on 5 Nov 2020)

Abstract: We design a framework for studying prelinguistic child voicefrom 3 to 24 months based on state-of-the-art algorithms in di-arization. Our system consists of a time-invariant feature ex-tractor, a context-dependent embedding generator, and a clas-sifier. We study the effect of swapping out different compo-nents of the system, as well as changing loss function, to findthe best performance. We also present a multiple-instancelearning technique that allows us to pre-train our parame-ters on larger datasets with coarser segment boundary labels.We found that our best system achieved 43.8% DER on testdataset, compared to 55.4% DER achieved by LENA soft-ware. We also found that using convolutional feature extrac-tor instead of logmel features significantly increases the per-formance of neural diarization.

Comments:	ICASSP 2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2011.02698 [eess.AS]
	(or arXiv:2011.02698v1 [eess.AS] for this version)

Submission history

From: Junzhe Zhu [view email]
[v1] Thu, 5 Nov 2020 08:21:05 GMT (285kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2011.02698

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Comparison Study on Infant-Parent Voice Diarization

Submission history