Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Jacob, Benoit; Kligys, Skirmantas; Chen, Bo; Zhu, Menglong; Tang, Matthew; Howard, Andrew; Adam, Hartwig; Kalenichenko, Dmitry

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1712

Computer Science > Machine Learning

Title: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Authors: Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko

(Submitted on 15 Dec 2017)

Abstract: The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Comments:	14 pages, 12 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1712.05877 [cs.LG]
	(or arXiv:1712.05877v1 [cs.LG] for this version)

Submission history

From: Bo Chen [view email]
[v1] Fri, 15 Dec 2017 23:56:52 GMT (392kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1712.05877

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Submission history