HUSE: Hierarchical Universal Semantic Embeddings

Narayana, Pradyumna; Pednekar, Aniket; Krishnamoorthy, Abishek; Sone, Kazoo; Basu, Sugato

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 1911

Computer Science > Computer Vision and Pattern Recognition

Title: HUSE: Hierarchical Universal Semantic Embeddings

Authors: Pradyumna Narayana, Aniket Pednekar, Abishek Krishnamoorthy, Kazoo Sone, Sugato Basu

(Submitted on 14 Nov 2019)

Abstract: There is a recent surge of interest in cross-modal representation learning corresponding to images and text. The main challenge lies in mapping images and text to a shared latent space where the embeddings corresponding to a similar semantic concept lie closer to each other than the embeddings corresponding to different semantic concepts, irrespective of the modality. Ranking losses are commonly used to create such shared latent space -- however, they do not impose any constraints on inter-class relationships resulting in neighboring clusters to be completely unrelated. The works in the domain of visual semantic embeddings address this problem by first constructing a semantic embedding space based on some external knowledge and projecting image embeddings onto this fixed semantic embedding space. These works are confined only to image domain and constraining the embeddings to a fixed space adds additional burden on learning. This paper proposes a novel method, HUSE, to learn cross-modal representation with semantic information. HUSE learns a shared latent space where the distance between any two universal embeddings is similar to the distance between their corresponding class embeddings in the semantic embedding space. HUSE also uses a classification objective with a shared classification layer to make sure that the image and text embeddings are in the same shared latent space. Experiments on UPMC Food-101 show our method outperforms previous state-of-the-art on retrieval, hierarchical precision and classification results.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1911.05978 [cs.CV]
	(or arXiv:1911.05978v1 [cs.CV] for this version)

Submission history

From: Pradyumna Narayana [view email]
[v1] Thu, 14 Nov 2019 07:45:32 GMT (1074kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.05978

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: HUSE: Hierarchical Universal Semantic Embeddings

Submission history