We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Supervised Advantage Actor-Critic for Recommender Systems

Abstract: Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence.
To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods . Code will be open-sourced.
Comments: 9 pages, 4 figures, In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM '22), February 21-25, 2022, Phoenix, Arizona. arXiv admin note: text overlap with arXiv:2006.05779
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Cite as: arXiv:2111.03474 [cs.LG]
  (or arXiv:2111.03474v1 [cs.LG] for this version)

Submission history

From: Ioannis Arapakis [view email]
[v1] Fri, 5 Nov 2021 12:51:15 GMT (538kb,D)

Link back to: arXiv, form interface, contact.