Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

Zhu, Yaochen; Xie, Jiayi; Chen, Zhenzhong

doi:10.1109/TMM.2021.3120537

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2003

Computer Science > Machine Learning

Title: Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

Authors: Yaochen Zhu, Jiayi Xie, Zhenzhong Chen

(Submitted on 28 Mar 2020)

Abstract: As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions. However, the popularity pattern of an individual micro-video still remains elusive among the researchers. One of the major challenges is that the potential popularity of a micro-video tends to fluctuate under the impact of various external factors, which makes it full of uncertainties. In addition, since micro-videos are mainly uploaded by individuals that lack professional techniques, multiple types of noise could exist that obscure useful information. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework for micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian embedding of a micro-video that is informative to its popularity level while preserves the inherent uncertainties simultaneously. Moreover, through the optimization of a deep variational information bottleneck lower-bound (IBLBO), the learned hidden representation is shown to be maximally expressive about the popularity target while maximally compressive to the noise in micro-video features. Furthermore, the Bayesian product-of-experts principle is applied to the multimodal encoder, where the decision for information keeping or discarding is made comprehensively with all available modalities. Extensive experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
DOI:	10.1109/TMM.2021.3120537
Cite as:	arXiv:2003.12724 [cs.LG]
	(or arXiv:2003.12724v1 [cs.LG] for this version)

Submission history

From: Zhenzhong Chen [view email]
[v1] Sat, 28 Mar 2020 06:08:16 GMT (4712kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2003.12724

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

Submission history