LightSeq: A High Performance Inference Library for Transformers

Wang, Xiaohui; Xiong, Ying; Wei, Yang; Wang, Mingxuan; Li, Lei

Full-text links:

Download:

Current browse context:

cs.MS

< prev | next >

new | recent | 2010

Computer Science > Mathematical Software

Title: LightSeq: A High Performance Inference Library for Transformers

Authors: Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li

(Submitted on 23 Oct 2020 (v1), last revised 22 Apr 2021 (this version, v4))

Abstract: Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose LightSeq, a highly efficient inference library for models in the Transformer family. LightSeq includes a series of GPU optimization techniques to to streamline the computation of neural layers and to reduce memory footprint. LightSeq can easily import models trained using PyTorch and Tensorflow. Experimental results on machine translation benchmarks show that LightSeq achieves up to 14x speedup compared with TensorFlow and 1.4x compared with FasterTransformer, a concurrent CUDA implementation. The code is available at this https URL

Comments:	8 pages, 6 figures, accepted by NAACL 2021 Industry Track
Subjects:	Mathematical Software (cs.MS); Machine Learning (cs.LG)
Cite as:	arXiv:2010.13887 [cs.MS]
	(or arXiv:2010.13887v4 [cs.MS] for this version)

Submission history

From: Yang Wei [view email]
[v1] Fri, 23 Oct 2020 13:45:26 GMT (9857kb,D)
[v2] Wed, 28 Oct 2020 02:47:33 GMT (10283kb,D)
[v3] Tue, 30 Mar 2021 07:17:59 GMT (19868kb,D)
[v4] Thu, 22 Apr 2021 09:37:37 GMT (19873kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.13887

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Mathematical Software

Title: LightSeq: A High Performance Inference Library for Transformers

Submission history