Current browse context:
cs.DC
Change to browse by:
References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores
(Submitted on 23 Jun 2021 (this version), latest version 16 Nov 2021 (v2))
Abstract: Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.
Submission history
From: Boyuan Feng [view email][v1] Wed, 23 Jun 2021 05:39:34 GMT (1634kb,D)
[v2] Tue, 16 Nov 2021 23:11:50 GMT (1634kb,D)
Link back to: arXiv, form interface, contact.