Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Xu, An; Huo, Zhouyuan; Huang, Heng

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2002

Computer Science > Machine Learning

Title: Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Authors: An Xu, Zhouyuan Huo, Heng Huang

(Submitted on 25 Feb 2020)

Abstract: The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the ideal linear training speedup regarding the number of devices. Gradient quantization is one of the common methods to reduce communication costs. However, it can lead to quantization error in the training and result in model performance degradation. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for \textbf{ANY} gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively, which dynamically determine the optimal quantization levels. Extensive experimental results on CIFAR and ImageNet datasets with several popular convolutional neural networks show the superiority of our proposed methods.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Cite as:	arXiv:2002.11082 [cs.LG]
	(or arXiv:2002.11082v1 [cs.LG] for this version)

Submission history

From: An Xu [view email]
[v1] Tue, 25 Feb 2020 18:28:39 GMT (256kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2002.11082

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Submission history