I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Peng, Yifan; Lee, Jaesong; Watanabe, Shinji

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2303

Computer Science > Computation and Language

Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Authors: Yifan Peng, Jaesong Lee, Shinji Watanabe

(Submitted on 14 Mar 2023)

Abstract: Transformer-based end-to-end speech recognition has achieved great success. However, the large footprint and computational overhead make it difficult to deploy these models in some real-world applications. Model compression techniques can reduce the model size and speed up inference, but the compressed model has a fixed architecture which might be suboptimal. We propose a novel Transformer encoder with Input-Dependent Dynamic Depth (I3D) to achieve strong performance-efficiency trade-offs. With a similar number of layers at inference time, I3D-based models outperform the vanilla Transformer and the static pruned model via iterative layer pruning. We also present interesting analysis on the gate probabilities and the input-dependency, which helps us better understand deep encoders.

Comments:	Accepted at ICASSP 2023
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.07624 [cs.CL]
	(or arXiv:2303.07624v1 [cs.CL] for this version)

Submission history

From: Yifan Peng [view email]
[v1] Tue, 14 Mar 2023 04:47:00 GMT (149kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.07624

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Submission history