We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Retrieval

Title: Structure and Semantics Preserving Document Representations

Abstract: Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally unrelated to be far apart in the representation space. Furthermore, the separation margins between the documents are varied flexibly to encode the heterogeneity in relationship strengths. The model is fully fine-tunable and natively supports query projection during inference. We demonstrate that it outperforms competing methods on multiple datasets for document retrieval tasks.
Subjects: Information Retrieval (cs.IR)
Cite as: arXiv:2201.03720 [cs.IR]
  (or arXiv:2201.03720v3 [cs.IR] for this version)

Submission history

From: Natraj Raman [view email]
[v1] Tue, 11 Jan 2022 00:48:32 GMT (643kb,D)
[v2] Wed, 12 Jan 2022 01:15:06 GMT (643kb,D)
[v3] Sat, 2 Apr 2022 00:43:42 GMT (644kb,D)

Link back to: arXiv, form interface, contact.