Learning Multiscale Transformer Models for Sequence Generation

Li, Bei; Zheng, Tong; Jing, Yi; Jiao, Chengbo; Xiao, Tong; Zhu, Jingbo

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computation and Language

Title: Learning Multiscale Transformer Models for Sequence Generation

Authors: Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu

(Submitted on 19 Jun 2022)

Abstract: Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency.

Comments:	accepted by ICML2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2206.09337 [cs.CL]
	(or arXiv:2206.09337v1 [cs.CL] for this version)

Submission history

From: Li Bei [view email]
[v1] Sun, 19 Jun 2022 07:28:54 GMT (3917kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.09337

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Learning Multiscale Transformer Models for Sequence Generation

Submission history