We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

New submissions

[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 9 Jun 23

[1]  arXiv:2306.05241 [pdf, other]
Title: Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors
Comments: To appear in ACL 2023 Findings
Subjects: Multimedia (cs.MM)

The prevalence of short video platforms has spawned a lot of fake news videos, which have stronger propagation ability than textual fake news. Thus, automatically detecting fake news videos has been an important countermeasure in practice. Previous works commonly verify each news video individually with multimodal information. Nevertheless, news videos from different perspectives regarding the same event are commonly posted together, which contain complementary or contradictory information and thus can be used to evaluate each other mutually. To this end, we introduce a new and practical paradigm, i.e., cross-sample fake news video detection, and propose a novel framework, Neighbor-Enhanced fakE news video Detection (NEED), which integrates the neighborhood relationship of new videos belonging to the same event. NEED can be readily combined with existing single-sample detectors and further enhance their performances with the proposed graph aggregation (GA) and debunking rectification (DR) modules. Specifically, given the feature representations obtained from single-sample detectors, GA aggregates the neighborhood information with the dynamic graph to enrich the features of independent samples. After that, DR explicitly leverages the relationship between debunking videos and fake news videos to refute the candidate videos via textual and visual consistency. Extensive experiments on the public benchmark demonstrate that NEED greatly improves the performance of both single-modal (up to 8.34% in accuracy) and multimodal (up to 4.97% in accuracy) base detectors. Codes are available in https://github.com/ICTMCG/NEED.

Cross-lists for Fri, 9 Jun 23

[2]  arXiv:2306.05268 (cross-list from cs.LG) [pdf, other]
Title: Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Comments: Code available at: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient for downstream tasks. However, in many real-world settings, task-relevant information is also contained in modality-unique regions: information that is only present in one modality but still relevant to the task. How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks? This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy. FactorCL is built from three new contributions: (1) factorizing task-relevant information into shared and unique representations, (2) capturing task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds, and (3) multimodal data augmentations to approximate task relevance without labels. On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results on six benchmarks.

Replacements for Fri, 9 Jun 23

[3]  arXiv:2112.08691 (replaced) [pdf, other]
Title: Towards Robust Neural Image Compression: Adversarial Attack and Model Finetuning
Authors: Tong Chen, Zhan Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[4]  arXiv:2302.10912 (replaced) [pdf, other]
Title: Balanced Audiovisual Dataset for Imbalance Analysis
Comments: website:this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2305.05139 (replaced) [src]
Title: Temporal Convolution Network Based Onset Detection and Query by Humming System Design
Comments: This paper has been withdrawn by the author due to a crucial definition of probability threshold and several grammer and vocabulary mistakes
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[6]  arXiv:2306.03718 (replaced) [pdf, other]
Title: Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder
Authors: Shulei Ji, Xinyu Yang
Comments: Accepted by IEEE SMC 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2306, contact, help  (Access key information)