Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Tian, Yusheng; Li, Jingyu; Lee, Tan

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2206

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Authors: Yusheng Tian, Jingyu Li, Tan Lee

(Submitted on 26 Jun 2022)

Abstract: Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling. Given the success of statistics-based pooling methods, we hypothesize that speaker characteristics are well represented in the statistical distribution over the pre-aggregation layer's output, and propose to use transport-oriented feature aggregation for deriving speaker embeddings. The aggregated representation encodes the geometric structure of the underlying feature distribution, which is expected to contain valuable speaker-specific information that may not be represented by the commonly used statistical measures like mean and variance. The original transport-oriented feature aggregation is also extended to a weighted-frame version to incorporate the attention mechanism. Experiments on speaker verification with the Voxceleb dataset show improvement over statistics pooling and its attentive variant.

Comments:	Accepted for presentation at INTERSPEECH 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2206.12857 [eess.AS]
	(or arXiv:2206.12857v1 [eess.AS] for this version)

Submission history

From: Yusheng Tian [view email]
[v1] Sun, 26 Jun 2022 12:22:53 GMT (263kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2206.12857

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Submission history