We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition

Abstract: The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR. Specifically, we combine the convolution block with the transformer block in a parallel design. Besides, we propose a bidirectional feature interaction module (BFIM) and a selective fusion module (SFM) to implement the interaction and fusion of local and global features, respectively. Extensive experiments on public ASR datasets demonstrate the effectiveness of our proposed InterFormer and its superior performance over the other Transformer and Conformer models.
Comments: Accepted by Interspeech 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2305.16342 [cs.CL]
  (or arXiv:2305.16342v2 [cs.CL] for this version)

Submission history

From: Zhihao Lai [view email]
[v1] Wed, 24 May 2023 08:43:44 GMT (977kb,D)
[v2] Mon, 29 May 2023 11:28:27 GMT (977kb,D)

Link back to: arXiv, form interface, contact.