We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Bidirectional Multiscale Feature Aggregation for Speaker Verification

Abstract: In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from different stages, an attentional fusion module is designed to compute the fusion weights. Experiments are conducted on the NIST SRE16 and VoxCeleb1 datasets. The experimental results demonstrate the effectiveness of the bidirectional aggregation strategy and show that the proposed attentional fusion module can further improve the performance.
Subjects: Audio and Speech Processing (eess.AS)
Cite as: arXiv:2104.00230 [eess.AS]
  (or arXiv:2104.00230v1 [eess.AS] for this version)

Submission history

From: Jiajun Qi [view email]
[v1] Thu, 1 Apr 2021 03:19:10 GMT (1064kb,D)

Link back to: arXiv, form interface, contact.