We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data

Abstract: This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet, obtaining state-of-the-art results.
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2104.12673 [cs.CV]
  (or arXiv:2104.12673v3 [cs.CV] for this version)

Submission history

From: Kai Han [view email]
[v1] Mon, 26 Apr 2021 15:56:16 GMT (4879kb,D)
[v2] Tue, 27 Apr 2021 09:00:44 GMT (4879kb,D)
[v3] Thu, 14 Oct 2021 22:43:27 GMT (1286kb,D)

Link back to: arXiv, form interface, contact.