We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: Knowledge Distillation with Representative Teacher Keys Based on Attention Mechanism for Image Classification Model Compression

Abstract: With the improvement of AI chips (e.g., GPU, TPU, and NPU) and the fast development of the Internet of Things (IoT), some robust deep neural networks (DNNs) are usually composed of millions or even hundreds of millions of parameters. Such a large model may not be suitable for directly deploying on low computation and low capacity units (e.g., edge devices). Knowledge distillation (KD) has recently been recognized as a powerful model compression method to decrease the model parameters effectively. The central concept of KD is to extract useful information from the feature maps of a large model (i.e., teacher model) as a reference to successfully train a small model (i.e., student model) in which the model size is much smaller than the teacher one. Although many KD methods have been proposed to utilize the information from the feature maps of intermediate layers in the teacher model, most did not consider the similarity of feature maps between the teacher model and the student model. As a result, it may make the student model learn useless information. Inspired by the attention mechanism, we propose a novel KD method called representative teacher key (RTK) that not only considers the similarity of feature maps but also filters out the useless information to improve the performance of the target student model. In the experiments, we validate our proposed method with several backbone networks (e.g., ResNet and WideResNet) and datasets (e.g., CIFAR10, CIFAR100, SVHN, and CINIC10). The results show that our proposed RTK can effectively improve the classification accuracy of the state-of-the-art attention-based KD method.
Comments: eight pages, six figures, three tables, for AAAI-23 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2206.12788 [cs.CV]
  (or arXiv:2206.12788v3 [cs.CV] for this version)

Submission history

From: Jun-Teng Yang [view email]
[v1] Sun, 26 Jun 2022 05:08:50 GMT (9505kb,D)
[v2] Thu, 28 Jul 2022 07:34:59 GMT (6069kb,D)
[v3] Wed, 10 Aug 2022 02:54:40 GMT (6069kb,D)

Link back to: arXiv, form interface, contact.