FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Khudia, Daya; Huang, Jianyu; Basu, Protonu; Deng, Summer; Liu, Haixin; Park, Jongsoo; Smelyanskiy, Mikhail

Full-text links:

Download:

Current browse context:

cs.PF

< prev | next >

new | recent | 2101

Computer Science > Machine Learning

Title: FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Authors: Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, Mikhail Smelyanskiy

(Submitted on 13 Jan 2021)

Abstract: Deep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed fbgemm, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. fbgemm achieves efficiency by fusing common quantization operations with a high-performance gemm implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2x performance gains with respect to our current production baseline.

Subjects:	Machine Learning (cs.LG); Performance (cs.PF)
Cite as:	arXiv:2101.05615 [cs.LG]
	(or arXiv:2101.05615v1 [cs.LG] for this version)

Submission history

From: Daya Khudia [view email]
[v1] Wed, 13 Jan 2021 00:34:04 GMT (282kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.05615

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Submission history