References & Citations
Computer Science > Computation and Language
Title: Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
(Submitted on 22 Nov 2020)
Abstract: Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.
Submission history
From: Yadollah Yaghoobzadeh [view email][v1] Sun, 22 Nov 2020 19:19:33 GMT (96kb,D)
Link back to: arXiv, form interface, contact.