References & Citations
Computer Science > Information Retrieval
Title: KuaiRec: A Fully-observed Dataset for Recommender Systems
(Submitted on 22 Feb 2022 (v1), revised 19 May 2022 (this version, v2), latest version 18 Aug 2022 (v3))
Abstract: Recommender systems are usually developed and evaluated on the historical user-item logs. However, most offline recommendation datasets are highly sparse and contain various biases, which hampers the evaluation of recommendation policies. Existing efforts aim to improve the data quality by collecting users' preferences on randomly selected items (e.g., Yahoo! and Coat). However, they still suffer from the high variance issue caused by the sparsely observed data. To fundamentally solve the problem, we present KuaiRec, a fully-observed dataset collected from the social video-sharing mobile App, Kuaishou. The feedback of 1,411 users on almost all of the 3,327 videos is explicitly observed. To the best of our knowledge, this is the first real-world fully-observed dataset with millions of user-item interactions in recommendation.
To demonstrate the advantage of KuaiRec, we leverage it to explore the key questions in evaluating conversational recommender systems. The experimental results show that two factors in traditional partially-observed data -- the data density and the exposure bias -- greatly affect the evaluation results. This entails the significance of our fully-observed data in researching many directions in recommender systems, e.g., the unbiased recommendation, interactive/conversational recommendation, and evaluation. We release the dataset and the pipeline implementation for evaluation at this https URL
Submission history
From: Chongming Gao [view email][v1] Tue, 22 Feb 2022 12:08:14 GMT (1200kb,D)
[v2] Thu, 19 May 2022 06:14:33 GMT (484kb,D)
[v3] Thu, 18 Aug 2022 08:53:37 GMT (484kb,D)
Link back to: arXiv, form interface, contact.