References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Visualizing the embedding space to explain the effect of knowledge distillation
(Submitted on 9 Oct 2021)
Abstract: Recent research has found that knowledge distillation can be effective in reducing the size of a network and in increasing generalization. A pre-trained, large teacher network, for example, was shown to be able to bootstrap a student model that eventually outperforms the teacher in a limited label environment. Despite these advances, it still is relatively unclear \emph{why} this method works, that is, what the resulting student model does 'better'. To address this issue, here, we utilize two non-linear, low-dimensional embedding methods (t-SNE and IVIS) to visualize representation spaces of different layers in a network. We perform a set of extensive experiments with different architecture parameters and distillation methods. The resulting visualizations and metrics clearly show that distillation guides the network to find a more compact representation space for higher accuracy already in earlier layers compared to its non-distilled version.
Submission history
From: Christian Wallraven [view email][v1] Sat, 9 Oct 2021 07:04:26 GMT (4229kb,D)
Link back to: arXiv, form interface, contact.