Multimedia

Authors and titles for recent submissions

[ total of 43 entries: 1-25 | 26-43 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 24 Apr 2024

[1] arXiv:2404.14934 [pdf, other]: Title: G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

Comments: 18 pages, 29 figures

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[2] arXiv:2404.14755 [pdf, other]: Title: SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models

Authors: Bo Lin, Yingjing Xu, Xuanwen Bao, Zhou Zhao, Zuyong Zhang, Zhouyang Wang, Jie Zhang, Shuiguang Deng, Jianwei Yin

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[3] arXiv:2404.14687 [pdf, other]: Title: Pegasus-v1 Technical Report

Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon, Genie Heo, Henry Choi, Jenna Kang, Kevin Han, Noah Seo, Sunny Nguyen, Ryan Won, Yeonhoo Park, Anthony Giuliani, Dave Chung, Hans Yoon, James Le, Jenny Ahn, June Lee, Maninder Saini, Meredith Sanders, Soyoung Lee, Sue Kim, Travis Couture

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[4] arXiv:2404.14573 [pdf, other]: Title: Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming

Authors: Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

Comments: Accepted by IEEE Intelligent Systems

Subjects: Multimedia (cs.MM)
[5] arXiv:2404.15276 (cross-list from cs.CV) [pdf, other]: Title: SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation

Authors: Xiangyu Xu, Lijuan Liu, Shuicheng Yan

Comments: Published at TPAMI 2024

Journal-ref: https://www.computer.org/csdl/journal/tp/2024/05/10354384/1SP2qWh8Fq0

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[6] arXiv:2404.15143 (cross-list from cs.SD) [pdf, other]: Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Authors: Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Carrie Gates, Kevin Butler, Patrick Traynor

Comments: Submitted to ACM journal -- Digital Threats: Research and Practice

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7] arXiv:2404.15107 (cross-list from cs.HC) [pdf, other]: Title: MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Authors: Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[8] arXiv:2404.15100 (cross-list from cs.CV) [pdf, other]: Title: Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Authors: Xun Wu, Shaohan Huang, Furu Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[9] arXiv:2404.14985 (cross-list from cs.CV) [pdf, other]: Title: Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification

Authors: Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu

Comments: Accepted by CVIU2024. More modifications may be performed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[10] arXiv:2404.14674 (cross-list from cs.LG) [pdf, other]: Title: HOIN: High-Order Implicit Neural Representations

Authors: Yang Chen, Ruituo Wu, Yipeng Liu, Ce Zhu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Tue, 23 Apr 2024 (showing first 15 of 18 entries)

[11] arXiv:2404.13993 [pdf, other]: Title: Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Authors: Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2404.13792 [pdf, other]: Title: Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

Authors: Donghuo Zeng, Roberto S. Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, kun Zhang

Comments: 14 pages, 10 figures, Accepted by Persuasive Technology 2024

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[13] arXiv:2404.13640 [pdf, other]: Title: Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Authors: Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li

Comments: 9 pages

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[14] arXiv:2404.13619 [pdf, other]: Title: Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering

Authors: Ben Fei, Yixuan Li, Weidong Yang, Lipeng Ma, Ying He

Subjects: Multimedia (cs.MM)
[15] arXiv:2404.13134 [pdf, other]: Title: Deep Learning-based Text-in-Image Watermarking

Authors: Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[16] arXiv:2404.14381 (cross-list from cs.CV) [pdf, other]: Title: TAVGBench: Benchmarking Text to Audible-Video Generation

Authors: Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

Comments: Technical Report. Project page:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[17] arXiv:2404.14037 (cross-list from cs.CV) [pdf, other]: Title: GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[18] arXiv:2404.13944 (cross-list from cs.CV) [pdf, other]: Title: Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas

Authors: Jia Wei Sii, Chee Seng Chan

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[19] arXiv:2404.13914 (cross-list from cs.SD) [pdf, other]: Title: Audio Anti-Spoofing Detection: A Survey

Authors: Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang

Comments: submitted to ACM Computing Surveys

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2404.13899 (cross-list from cs.CL) [pdf, other]: Title: Towards Better Text-to-Image Generation Alignment via Attention Modulation

Authors: Yihang Wu, Xiao Cao, Kaixin Li, Zitan Chen, Haonan Wang, Lei Meng, Zhiyong Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[21] arXiv:2404.13808 (cross-list from cs.IR) [pdf, other]: Title: General Item Representation Learning for Cold-start Content Recommendations

Authors: Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee

Comments: 14 pages

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[22] arXiv:2404.13789 (cross-list from cs.SD) [pdf, other]: Title: Anchor-aware Deep Metric Learning for Audio-visual Retrieval

Authors: Donghuo Zeng, Yanan Wang, Kazushi Ikeda, Yi Yu

Comments: 9 pages, 5 figures. Accepted by ACM ICMR 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.13628 (cross-list from cs.CL) [pdf, other]: Title: Mixture of LoRA Experts

Authors: Xun Wu, Shaohan Huang, Furu Wei

Comments: 17 pages, 11 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[24] arXiv:2404.13621 (cross-list from cs.CV) [pdf, other]: Title: Attack on Scene Flow using Point Clouds

Authors: Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[25] arXiv:2404.13370 (cross-list from cs.CV) [pdf, other]: Title: Movie101v2: Improved Movie Narration Benchmark

Authors: Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)

[ total of 43 entries: 1-25 | 26-43 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions

Wed, 24 Apr 2024

Tue, 23 Apr 2024 (showing first 15 of 18 entries)