We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.PF

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Abstract: Deep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed fbgemm, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. fbgemm achieves efficiency by fusing common quantization operations with a high-performance gemm implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2x performance gains with respect to our current production baseline.
Subjects: Machine Learning (cs.LG); Performance (cs.PF)
Cite as: arXiv:2101.05615 [cs.LG]
  (or arXiv:2101.05615v1 [cs.LG] for this version)

Submission history

From: Daya Khudia [view email]
[v1] Wed, 13 Jan 2021 00:34:04 GMT (282kb,D)

Link back to: arXiv, form interface, contact.