Sequential Recommendation via Stochastic Self-Attention

Fan, Ziwei; Liu, Zhiwei; Wang, Alice; Nazari, Zahra; Zheng, Lei; Peng, Hao; Yu, Philip S.

Full-text links:

Download:

Current browse context:

cs.IR

< prev | next >

new | recent | 2201

Computer Science > Information Retrieval

Title: Sequential Recommendation via Stochastic Self-Attention

Authors: Ziwei Fan, Zhiwei Liu, Alice Wang, Zahra Nazari, Lei Zheng, Hao Peng, Philip S. Yu

(Submitted on 16 Jan 2022 (v1), last revised 5 Mar 2022 (this version, v2))

Abstract: Sequential recommendation models the dynamics of a user's previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure the relationship between items, demonstrate superior capabilities among existing sequential methods. However, users' real-world sequential behaviors are \textit{\textbf{uncertain}} rather than deterministic, posing a significant challenge to present techniques. We further suggest that dot-product-based approaches cannot fully capture \textit{\textbf{collaborative transitivity}}, which can be derived in item-item transitions inside sequences and is beneficial for cold start items. We further argue that BPR loss has no constraint on positive and sampled negative items, which misleads the optimization. We propose a novel \textbf{STO}chastic \textbf{S}elf-\textbf{A}ttention~(STOSA) to overcome these issues. STOSA, in particular, embeds each item as a stochastic Gaussian distribution, the covariance of which encodes the uncertainty. We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences, which effectively incorporates uncertainty into model training. Wasserstein attentions also enlighten the collaborative transitivity learning as it satisfies triangle inequality. Moreover, we introduce a novel regularization term to the ranking loss, which assures the dissimilarity between positive and the negative items. Extensive experiments on five real-world benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art baselines, especially on cold start items. The code is available in \url{this https URL}.

Comments:	updated version for camera-ready
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2201.06035 [cs.IR]
	(or arXiv:2201.06035v2 [cs.IR] for this version)

Submission history

From: Ziwei Fan [view email]
[v1] Sun, 16 Jan 2022 12:38:45 GMT (8347kb,D)
[v2] Sat, 5 Mar 2022 17:00:57 GMT (4708kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.06035

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Information Retrieval

Title: Sequential Recommendation via Stochastic Self-Attention

Submission history