APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Feng, Boyuan; Wang, Yuke; Geng, Tong; Li, Ang; Ding, Yufei

Full-text links:

Download:

Current browse context:

cs.DC

< prev | next >

new | recent | 2106

Computer Science > Distributed, Parallel, and Cluster Computing

Title: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Authors: Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding

(Submitted on 23 Jun 2021 (this version), latest version 16 Nov 2021 (v2))

Abstract: Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

Comments:	Accepted by SC'21
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.12169 [cs.DC]
	(or arXiv:2106.12169v1 [cs.DC] for this version)

Submission history

From: Boyuan Feng [view email]
[v1] Wed, 23 Jun 2021 05:39:34 GMT (1634kb,D)
[v2] Tue, 16 Nov 2021 23:11:50 GMT (1634kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.12169v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Distributed, Parallel, and Cluster Computing

Title: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Submission history