We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions

[ total of 43 entries: 1-25 | 26-43 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 24 Apr 2024

[1]  arXiv:2404.14934 [pdf, other]
Title: G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition
Comments: 18 pages, 29 figures
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[2]  arXiv:2404.14755 [pdf, other]
Title: SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[3]  arXiv:2404.14687 [pdf, other]
[4]  arXiv:2404.14573 [pdf, other]
Title: Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming
Comments: Accepted by IEEE Intelligent Systems
Subjects: Multimedia (cs.MM)
[5]  arXiv:2404.15276 (cross-list from cs.CV) [pdf, other]
Title: SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
Comments: Published at TPAMI 2024
Journal-ref: https://www.computer.org/csdl/journal/tp/2024/05/10354384/1SP2qWh8Fq0
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[6]  arXiv:2404.15143 (cross-list from cs.SD) [pdf, other]
Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath
Comments: Submitted to ACM journal -- Digital Threats: Research and Practice
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7]  arXiv:2404.15107 (cross-list from cs.HC) [pdf, other]
Title: MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[8]  arXiv:2404.15100 (cross-list from cs.CV) [pdf, other]
Title: Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[9]  arXiv:2404.14985 (cross-list from cs.CV) [pdf, other]
Title: Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
Comments: Accepted by CVIU2024. More modifications may be performed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[10]  arXiv:2404.14674 (cross-list from cs.LG) [pdf, other]
Title: HOIN: High-Order Implicit Neural Representations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Tue, 23 Apr 2024 (showing first 15 of 18 entries)

[11]  arXiv:2404.13993 [pdf, other]
Title: Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[12]  arXiv:2404.13792 [pdf, other]
Title: Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome
Comments: 14 pages, 10 figures, Accepted by Persuasive Technology 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[13]  arXiv:2404.13640 [pdf, other]
Title: Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
Comments: 9 pages
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[14]  arXiv:2404.13619 [pdf, other]
Title: Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering
Subjects: Multimedia (cs.MM)
[15]  arXiv:2404.13134 [pdf, other]
Title: Deep Learning-based Text-in-Image Watermarking
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[16]  arXiv:2404.14381 (cross-list from cs.CV) [pdf, other]
Title: TAVGBench: Benchmarking Text to Audible-Video Generation
Comments: Technical Report. Project page:this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[17]  arXiv:2404.14037 (cross-list from cs.CV) [pdf, other]
Title: GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[18]  arXiv:2404.13944 (cross-list from cs.CV) [pdf, other]
Title: Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[19]  arXiv:2404.13914 (cross-list from cs.SD) [pdf, other]
Title: Audio Anti-Spoofing Detection: A Survey
Comments: submitted to ACM Computing Surveys
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20]  arXiv:2404.13899 (cross-list from cs.CL) [pdf, other]
Title: Towards Better Text-to-Image Generation Alignment via Attention Modulation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[21]  arXiv:2404.13808 (cross-list from cs.IR) [pdf, other]
Title: General Item Representation Learning for Cold-start Content Recommendations
Comments: 14 pages
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[22]  arXiv:2404.13789 (cross-list from cs.SD) [pdf, other]
Title: Anchor-aware Deep Metric Learning for Audio-visual Retrieval
Comments: 9 pages, 5 figures. Accepted by ACM ICMR 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23]  arXiv:2404.13628 (cross-list from cs.CL) [pdf, other]
Title: Mixture of LoRA Experts
Comments: 17 pages, 11 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[24]  arXiv:2404.13621 (cross-list from cs.CV) [pdf, other]
Title: Attack on Scene Flow using Point Clouds
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[25]  arXiv:2404.13370 (cross-list from cs.CV) [pdf, other]
Title: Movie101v2: Improved Movie Narration Benchmark
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[ total of 43 entries: 1-25 | 26-43 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help  (Access key information)