We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

Authors and titles for recent submissions, skipping first 311

[ total of 790 entries: 1-1000 | 312-790 ]
[ showing up to 1000 entries per page: fewer | more ]

Wed, 29 May 2024 (continued, showing last 101 of 152 entries)

[312]  arXiv:2405.17928 [pdf, other]
Title: Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection
Comments: 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[313]  arXiv:2405.17926 [pdf, other]
Title: SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[314]  arXiv:2405.17916 [pdf, other]
Title: Boosting General Trimap-free Matting in the Real-World Image
Comments: 13 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[315]  arXiv:2405.17913 [pdf, other]
Title: OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[316]  arXiv:2405.17905 [pdf, other]
Title: Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
[317]  arXiv:2405.17903 [pdf, other]
Title: Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion
Comments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[318]  arXiv:2405.17901 [pdf, other]
Title: Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing
Comments: 7 pages, 3 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[319]  arXiv:2405.17894 [pdf, other]
Title: White-box Multimodal Jailbreaks Against Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[320]  arXiv:2405.17891 [pdf, other]
Title: A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[321]  arXiv:2405.17886 [pdf, ps, other]
Title: Graphomotor and Handwriting Disabilities Rating Scale (GHDRS):towards complex and objective assessment
Journal-ref: Australian Journalof Learning Difficulties, Routledge, 1-34,2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[322]  arXiv:2405.17877 [pdf, other]
Title: Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[323]  arXiv:2405.17873 [pdf, other]
Title: MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[324]  arXiv:2405.17872 [pdf, other]
Title: HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction
Comments: 13 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[325]  arXiv:2405.17871 [pdf, other]
Title: Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[326]  arXiv:2405.17859 [pdf, other]
Title: Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Comments: 22 pages, 9 figures, Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[327]  arXiv:2405.17855 [pdf, other]
Title: A Deep Neural Network Approach to Fare Evasion
Comments: 4 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[328]  arXiv:2405.17842 [pdf, other]
Title: Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[329]  arXiv:2405.17835 [pdf, other]
Title: Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting
Comments: Early accepted at MICCAI 2024, 10 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[330]  arXiv:2405.17825 [pdf, other]
Title: Diffusion Model Patching via Mixture-of-Prompts
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[331]  arXiv:2405.17824 [pdf, other]
Title: mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[332]  arXiv:2405.17821 [pdf, other]
Title: RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[333]  arXiv:2405.17820 [pdf, other]
Title: Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[334]  arXiv:2405.17818 [pdf, other]
Title: Hyperspectral and multispectral image fusion with arbitrary resolution through self-supervised representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[335]  arXiv:2405.17817 [pdf, other]
Title: Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences
Journal-ref: IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[336]  arXiv:2405.17816 [pdf, other]
Title: Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[337]  arXiv:2405.17815 [pdf, other]
Title: Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[338]  arXiv:2405.17814 [pdf, other]
Title: FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[339]  arXiv:2405.17793 [pdf, other]
Title: SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction
Comments: Comprehensive experiments are in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[340]  arXiv:2405.17790 [pdf, other]
Title: Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification
Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[341]  arXiv:2405.17788 [pdf, other]
Title: Enhancing Road Safety: Real-Time Detection of Driver Distraction through Convolutional Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[342]  arXiv:2405.17774 [pdf, other]
Title: Gradually Vanishing Gap in Prototypical Network for Unsupervised Domain Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[343]  arXiv:2405.17773 [pdf, other]
Title: Towards a Generalist and Blind RGB-X Tracker
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[344]  arXiv:2405.17765 [pdf, other]
Title: PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Comments: CVPR 2024, 11 pages, 4 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[345]  arXiv:2405.17730 [pdf, other]
Title: MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Authors: Yake Wei, Di Hu
Comments: Accepted by ICML2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[346]  arXiv:2405.17729 [pdf, other]
Title: Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[347]  arXiv:2405.17725 [pdf, other]
Title: Color Shift Estimation-and-Correction for Image Enhancement
Comments: CVPR2024 accepted paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[348]  arXiv:2405.17720 [pdf, other]
Title: MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[349]  arXiv:2405.17719 [pdf, other]
Title: EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[350]  arXiv:2405.17718 [pdf, other]
Title: AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[351]  arXiv:2405.17705 [pdf, other]
Title: DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos
Comments: 9 pages,7 figures;project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[352]  arXiv:2405.17704 [pdf, other]
Title: Consistency Regularisation for Unsupervised Domain Adaptation in Monocular Depth Estimation
Comments: Accepted to Conference on Lifelong Learning Agents (CoLLAs) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[353]  arXiv:2405.17698 [pdf, other]
Title: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos
Comments: Dataset will be published shortly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[354]  arXiv:2405.17686 [pdf, other]
Title: Towards Causal Physical Error Discovery in Video Analytics Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[355]  arXiv:2405.17680 [pdf, other]
Title: Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent
Authors: Yi Xu, Yun Fu
Comments: Datasets, code, and model weights at available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[356]  arXiv:2405.17678 [pdf, other]
Title: TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[357]  arXiv:2405.17677 [pdf, other]
Title: Understanding differences in applying DETR to natural and medical images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[358]  arXiv:2405.17673 [pdf, other]
Title: Fast Samplers for Inverse Problems in Iterative Refinement Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[359]  arXiv:2405.17661 [pdf, other]
Title: RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[360]  arXiv:2405.17660 [pdf, other]
Title: LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[361]  arXiv:2405.17613 [pdf, other]
Title: A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[362]  arXiv:2405.17609 [pdf, other]
Title: GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[363]  arXiv:2405.17596 [pdf, other]
Title: GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane
Comments: Our project page is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[364]  arXiv:2405.17568 [pdf, other]
Title: ExtremeMETA: High-speed Lightweight Image Segmentation Model by Remodeling Multi-channel Metamaterial Imagers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[365]  arXiv:2405.17532 [pdf, other]
Title: ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[366]  arXiv:2405.17531 [pdf, other]
Title: Evolutive Rendering Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[367]  arXiv:2405.17523 [pdf, other]
Title: Locally Testing Model Detections for Semantic Global Concepts
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[368]  arXiv:2405.17475 [pdf, other]
Title: How Culturally Aware are Vision-Language Models?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[369]  arXiv:2405.17457 [pdf, other]
Title: Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[370]  arXiv:2405.17456 [pdf, other]
Title: Optimized Linear Measurements for Inverse Problems using Diffusion-Based Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[371]  arXiv:2405.17455 [pdf, other]
Title: WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (stat.ML)
[372]  arXiv:2405.17450 [pdf, other]
Title: The Power of Next-Frame Prediction for Learning Physical Laws
Comments: 7 Figures, 12 Pages, 1 Table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[373]  arXiv:2405.17449 [pdf, ps, other]
Title: Image Based Character Recognition, Documentation System To Decode Inscription From Temple
Comments: This research paper is a part of capstone project submitted to VIT Chennai, VIT University
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[374]  arXiv:2405.17447 [pdf, other]
Title: How to train your ViT for OOD Detection
Comments: arXiv admin note: text overlap with arXiv:2306.00826
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[375]  arXiv:2405.17444 [pdf, other]
Title: Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network
Authors: Min Hun Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[376]  arXiv:2405.18418 (cross-list from cs.LG) [pdf, other]
Title: Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Comments: Code and videos at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[377]  arXiv:2405.18410 (cross-list from eess.IV) [pdf, other]
Title: Towards a Sampling Theory for Implicit Neural Representations
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[378]  arXiv:2405.18407 (cross-list from cs.LG) [pdf, other]
Title: Phased Consistency Model
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[379]  arXiv:2405.18376 (cross-list from cs.LG) [pdf, other]
Title: Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[380]  arXiv:2405.18358 (cross-list from cs.CL) [pdf, other]
Title: MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[381]  arXiv:2405.18356 (cross-list from eess.IV) [pdf, other]
Title: Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography
Comments: Accepted to Medical Image Analysis
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[382]  arXiv:2405.18334 (cross-list from cs.DB) [pdf, other]
Title: SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches
Journal-ref: Published on International Conference on Very Large Databases 2024
Subjects: Databases (cs.DB); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[383]  arXiv:2405.18327 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Histopathology Based AI Model Predicts Anti-Angiogenic Therapy Response in Renal Cancer Clinical Trial
Comments: 19 pages, 4 Figures
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[384]  arXiv:2405.18267 (cross-list from eess.IV) [pdf, other]
Title: CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths
Comments: Early acceptance at MICCAI2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[385]  arXiv:2405.18236 (cross-list from cs.CR) [pdf, other]
Title: Position Paper: Think Globally, React Locally -- Bringing Real-time Reference-based Website Phishing Detection on macOS
Comments: 8 pages, 7 figures, 8 tables. Accepted to STAST'24, 14th International Workshop on Socio-Technical Aspects in Security, Affiliated with the 9th IEEE European Symposium on Security and Privacy, this https URL
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[386]  arXiv:2405.18213 (cross-list from cs.SD) [pdf, other]
Title: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Comments: Project Page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[387]  arXiv:2405.18196 (cross-list from cs.RO) [pdf, other]
Title: Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning
Comments: Robotics: Science and Systems (RSS) 2024. Videos are available on our project webpage at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[388]  arXiv:2405.18193 (cross-list from cs.LG) [pdf, other]
Title: In-Context Symmetries: Self-Supervised Learning through Contextual World Models
Comments: 32 pages, 24 tables and 11 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[389]  arXiv:2405.18167 (cross-list from eess.IV) [pdf, other]
Title: Confidence-aware multi-modality learning for eye disease screening
Comments: 27 pages, 7 figures, 9 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[390]  arXiv:2405.18064 (cross-list from cs.AI) [pdf, ps, other]
Title: Automated Real-World Sustainability Data Generation from Images of Buildings
Comments: 6 pages
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[391]  arXiv:2405.18045 (cross-list from cs.LG) [pdf, other]
Title: Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
Comments: Accepted at ICML 2024. Code available at: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[392]  arXiv:2405.17969 (cross-list from cs.CL) [pdf, other]
Title: Knowledge Circuits in Pretrained Transformers
Comments: Work in progress, 25 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[393]  arXiv:2405.17927 (cross-list from cs.AI) [pdf, other]
Title: The Evolution of Multimodal Model Architectures
Comments: 30 pages, 6 tables, 7 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[394]  arXiv:2405.17811 (cross-list from cs.GR) [pdf, other]
Title: Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Comments: Project page here: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[395]  arXiv:2405.17769 (cross-list from cs.RO) [pdf, other]
Title: Microsaccade-inspired Event Camera for Robotics
Comments: Published on Science Robotics June 2024 issue
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[396]  arXiv:2405.17756 (cross-list from eess.IV) [pdf, ps, other]
Title: Motion-Informed Deep Learning for Brain MR Image Reconstruction Framework
Comments: 22 pages, 7 figures, 4 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
[397]  arXiv:2405.17706 (cross-list from cs.AI) [pdf, other]
Title: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
Authors: Kevin Dela Rosa
Comments: SIGIR 2024 Workshop on Multimodal Representation and Retrieval (MRR 2024)
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[398]  arXiv:2405.17663 (cross-list from cs.LG) [pdf, other]
Title: What's the Opposite of a Face? Finding Shared Decodable Concepts and their Negations in the Brain
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[399]  arXiv:2405.17659 (cross-list from eess.IV) [pdf, other]
Title: Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[400]  arXiv:2405.17537 (cross-list from cs.AI) [pdf, other]
Title: BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
Comments: 16 pages with 9 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[401]  arXiv:2405.17533 (cross-list from cs.AI) [pdf, other]
Title: PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends
Comments: Attribute Extraction, PDF files, Bert Embedding, Hashtag, Large Language Model (LLM), Text and Images
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[402]  arXiv:2405.17520 (cross-list from eess.IV) [pdf, other]
Title: Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[403]  arXiv:2405.17518 (cross-list from eess.IV) [pdf, other]
Title: Assessment of Left Atrium Motion Deformation Through Full Cardiac Cycle
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[404]  arXiv:2405.17517 (cross-list from cs.LG) [pdf, other]
Title: WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Authors: Louis Fournier (MLIA), Adel Nabli (MLIA, Mila), Masih Aminbeidokhti (ETS), Marco Pedersoli (ETS), Eugene Belilovsky (Mila), Edouard Oyallon
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[405]  arXiv:2405.17506 (cross-list from cs.LG) [pdf, other]
Title: Subspace Node Pruning
Comments: 14 pages, 8 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[406]  arXiv:2405.17484 (cross-list from cs.LG) [pdf, other]
Title: Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[407]  arXiv:2405.17472 (cross-list from cs.LG) [pdf, other]
Title: FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing
Authors: Kai Huang, Wei Gao
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[408]  arXiv:2405.17461 (cross-list from cs.LG) [pdf, other]
Title: EMR-Merging: Tuning-Free High-Performance Model Merging
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[409]  arXiv:2405.17460 (cross-list from cs.LG) [pdf, ps, other]
Title: Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[410]  arXiv:2405.17459 (cross-list from cs.LG) [pdf, ps, other]
Title: Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[411]  arXiv:2405.17446 (cross-list from eess.IV) [pdf, ps, other]
Title: Whole Slide Image Survival Analysis Using Histopathological Feature Extractors
Comments: 4 pages, preliminary results exploring UNI feature extractor, will attempt to gather more results and check again for correctness and consistency
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[412]  arXiv:2405.17445 (cross-list from cs.LG) [pdf, other]
Title: On margin-based generalization prediction in deep neural networks
Authors: Coenraad Mouton
Comments: PhD Thesis
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Tue, 28 May 2024

[413]  arXiv:2405.17430 [pdf, other]
Title: Matryoshka Multimodal Models
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[414]  arXiv:2405.17429 [pdf, other]
Title: GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Comments: Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[415]  arXiv:2405.17427 [pdf, other]
Title: Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[416]  arXiv:2405.17426 [pdf, other]
Title: Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving
Comments: Preprint; 17 pages, 13 figures, 11 tables; Code at this https URL: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[417]  arXiv:2405.17424 [pdf, other]
Title: LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[418]  arXiv:2405.17423 [pdf, other]
Title: Privacy-Aware Visual Language Models
Comments: preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[419]  arXiv:2405.17422 [pdf, other]
Title: Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection
Comments: Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[420]  arXiv:2405.17421 [pdf, other]
Title: MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[421]  arXiv:2405.17419 [pdf, other]
Title: MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities
Comments: Code and MultiOOD benchmark: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[422]  arXiv:2405.17418 [pdf, other]
Title: Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[423]  arXiv:2405.17414 [pdf, other]
Title: Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[424]  arXiv:2405.17405 [pdf, other]
Title: Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Comments: Our project website is this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2405.17398 [pdf, other]
Title: Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Comments: Code and model: this https URL, video demos: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[426]  arXiv:2405.17397 [pdf, other]
Title: Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427]  arXiv:2405.17393 [pdf, other]
Title: EASI-Tex: Edge-Aware Mesh Texturing from Single Image
Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH), 2024. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[428]  arXiv:2405.17369 [pdf, ps, other]
Title: Predict joint angle of body parts based on sequence pattern recognition
Journal-ref: 2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429]  arXiv:2405.17368 [pdf, other]
Title: Fusing uncalibrated IMUs and handheld smartphone video to reconstruct knee kinematics
Comments: Accepted to International Conference on Biomedical Robotics and Biomechatronics 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[430]  arXiv:2405.17351 [pdf, other]
Title: DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Refocusing,Defocus Rendering and Blur Removal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[431]  arXiv:2405.17323 [pdf, other]
Title: Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association
Comments: 6 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[432]  arXiv:2405.17315 [pdf, other]
Title: All-day Depth Completion
Comments: 8 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:2405.17306 [pdf, other]
Title: Controllable Longer Image Animation with Diffusion Models
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[434]  arXiv:2405.17262 [pdf, other]
Title: Deep Feature Gaussian Processes for Single-Scene Aerosol Optical Depth Reconstruction
Comments: Accepted to IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435]  arXiv:2405.17251 [pdf, other]
Title: GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[436]  arXiv:2405.17241 [pdf, other]
Title: NeurTV: Total Variation on the Neural Domain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[437]  arXiv:2405.17240 [pdf, other]
Title: Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Comments: Accepted by CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[438]  arXiv:2405.17201 [pdf, other]
Title: Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Comments: 21 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439]  arXiv:2405.17191 [pdf, other]
Title: MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Subjects: Computer Vision and Pattern Recognition (cs.CV); Probability (math.PR)
[440]  arXiv:2405.17188 [pdf, other]
Title: The SkatingVerse Workshop & Challenge: Methods and Results
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[441]  arXiv:2405.17187 [pdf, other]
Title: Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Comments: Project page: this https URL; Code and data: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[442]  arXiv:2405.17158 [pdf, other]
Title: PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[443]  arXiv:2405.17149 [pdf, other]
Title: LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2405.17146 [pdf, other]
Title: Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[445]  arXiv:2405.17140 [pdf, other]
Title: SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446]  arXiv:2405.17139 [pdf, other]
Title: Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Comments: arXiv admin note: substantial text overlap with arXiv:2312.14400
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[447]  arXiv:2405.17137 [pdf, other]
Title: Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[448]  arXiv:2405.17136 [pdf, other]
Title: PanoTree: Autonomous Photo-Spot Explorer in Virtual Reality Scenes
Comments: 12pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[449]  arXiv:2405.17110 [pdf, other]
Title: Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification
Comments: 0
Journal-ref: journal={IEEE Geoscience and Remote Sensing Letters}, year={2023}, publisher={IEEE}
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[450]  arXiv:2405.17104 [pdf, other]
Title: LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[451]  arXiv:2405.17102 [pdf, other]
Title: DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge
Comments: Outstanding Champion in the RoboDepth Challenge (ICRA24) this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[452]  arXiv:2405.17097 [pdf, ps, other]
Title: Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation
Comments: Submitted to Forum Bildverarbeitung 2024. arXiv admin note: substantial text overlap with arXiv:2402.10580
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[453]  arXiv:2405.17083 [pdf, other]
Title: F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting
Comments: Our project page including code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[454]  arXiv:2405.17082 [pdf, other]
Title: Ensembling Diffusion Models via Adaptive Feature Aggregation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[455]  arXiv:2405.17074 [pdf, other]
Title: Towards Ultra-High-Definition Image Deraining: A Benchmark and An Efficient Method
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[456]  arXiv:2405.17069 [pdf, other]
Title: Training-free Editioning of Text-to-Image Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[457]  arXiv:2405.17037 [pdf, other]
Title: BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network
Comments: 19 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[458]  arXiv:2405.17030 [pdf, ps, other]
Title: SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving
Comments: Accepted in International Conference on Microwaves for Intelligent Mobility - 16.&17. April 2024 - Boppard near Koblenz, Germany
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[459]  arXiv:2405.17022 [pdf, other]
Title: Compositional Few-Shot Class-Incremental Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[460]  arXiv:2405.17016 [pdf, other]
Title: $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2405.17013 [pdf, other]
Title: MotionLLM: Multimodal Motion-Language Learning with Large Language Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[462]  arXiv:2405.17004 [pdf, other]
Title: Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness
Comments: 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[463]  arXiv:2405.17002 [pdf, other]
Title: UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[464]  arXiv:2405.16996 [pdf, other]
Title: Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Comments: 10 pages, 5 figures, received by IEEE/CVF Computer Science and Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:2405.16980 [pdf, other]
Title: DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[466]  arXiv:2405.16973 [pdf, other]
Title: Collective Perception Datasets for Autonomous Driving: A Comprehensive Review
Comments: Accepted at IEEE IV 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2405.16960 [pdf, other]
Title: DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
Comments: 13 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[468]  arXiv:2405.16959 [pdf, other]
Title: A Machine Learning Approach to Analyze the Effects of Alzheimer's Disease on Handwriting through Lognormal Features
Journal-ref: IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer (2023)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469]  arXiv:2405.16953 [pdf, other]
Title: Evaluation of Resource-Efficient Crater Detectors on Embedded Systems
Comments: Accepted at 2024 IEEE International Geoscience and Remote Sensing Symposium
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[470]  arXiv:2405.16947 [pdf, other]
Title: Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471]  arXiv:2405.16940 [pdf, other]
Title: Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[472]  arXiv:2405.16934 [pdf, other]
Title: Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[473]  arXiv:2405.16930 [pdf, other]
Title: From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[474]  arXiv:2405.16925 [pdf, other]
Title: OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Comments: Accepted by CVPR'24
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[475]  arXiv:2405.16923 [pdf, other]
Title: SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain
Comments: Might need more comparison, will be add later
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[476]  arXiv:2405.16919 [pdf, other]
Title: VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[477]  arXiv:2405.16915 [pdf, other]
Title: Multilingual Diversity Improves Vision-Language Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[478]  arXiv:2405.16909 [pdf, other]
Title: A Cross-Dataset Study for Text-based 3D Human Motion Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479]  arXiv:2405.16895 [pdf, other]
Title: Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation
Comments: 15 pages, 8 figures and 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[480]  arXiv:2405.16890 [pdf, other]
Title: PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Comments: Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481]  arXiv:2405.16886 [pdf, other]
Title: Hawk: Learning to Understand Open-World Video Anomalies
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[482]  arXiv:2405.16874 [pdf, other]
Title: CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Comments: The dataset will be released as soon as possible
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[483]  arXiv:2405.16873 [pdf, other]
Title: ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[484]  arXiv:2405.16868 [pdf, other]
Title: RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485]  arXiv:2405.16860 [pdf, other]
Title: Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Comments: Accept to NAACL 2024(main)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[486]  arXiv:2405.16858 [pdf, other]
Title: Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487]  arXiv:2405.16849 [pdf, other]
Title: Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation
Comments: Our project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[488]  arXiv:2405.16848 [pdf, other]
Title: A re-calibration method for object detection with multi-modal alignment bias in autonomous driving
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[489]  arXiv:2405.16847 [pdf, other]
Title: TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[490]  arXiv:2405.16829 [pdf, other]
Title: PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting
Authors: Zipeng Wang, Dan Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[491]  arXiv:2405.16823 [pdf, other]
Title: Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[492]  arXiv:2405.16822 [pdf, other]
Title: Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[493]  arXiv:2405.16817 [pdf, other]
Title: Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model
Comments: WACV2024 Oral. Code is at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[494]  arXiv:2405.16815 [pdf, other]
Title: Image-level Regression for Uncertainty-aware Retinal Image Segmentation
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[495]  arXiv:2405.16813 [pdf, other]
Title: SiNGR: Brain Tumor Segmentation via Signed Normalized Geodesic Transform Regression
Comments: Accepted as a conference paper at MICCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[496]  arXiv:2405.16807 [pdf, other]
Title: Extreme Compression of Adaptive Neural Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
[497]  arXiv:2405.16803 [pdf, other]
Title: TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[498]  arXiv:2405.16796 [pdf, other]
Title: DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[499]  arXiv:2405.16790 [pdf, other]
Title: SCSim: A Realistic Spike Cameras Simulator
Comments: Accepted by ICME2024. arXiv admin note: substantial text overlap with arXiv:2304.03129
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[500]  arXiv:2405.16788 [pdf, other]
Title: 3D Reconstruction with Fast Dipole Sums
Comments: The ancillary files include an HTML supplement with interactive visualizations of all experiment results at "supplement/index.html". To download the supplement as a single archive, see "supplement.7z"
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[501]  arXiv:2405.16785 [pdf, other]
Title: PromptFix: You Prompt and We Fix the Photo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[502]  arXiv:2405.16766 [pdf, other]
Title: Reframing the Relationship in Out-of-Distribution Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[503]  arXiv:2405.16761 [pdf, other]
Title: Masked Face Recognition with Generative-to-Discriminative Representations
Comments: Accepted by International Conference on Machine Learning 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[504]  arXiv:2405.16759 [pdf, other]
Title: Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[505]  arXiv:2405.16748 [pdf, ps, other]
Title: Hypergraph Laplacian Eigenmaps and Face Recognition Problems
Authors: Loc Hoang Tran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[506]  arXiv:2405.16740 [pdf, other]
Title: PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation
Comments: 7 pages, 9 figures, Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[507]  arXiv:2405.16738 [pdf, other]
Title: CARL: A Framework for Equivariant Image Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[508]  arXiv:2405.16728 [pdf, other]
Title: Towards Multi-Task Multi-Modal Models: A Video Generative Perspective
Authors: Lijun Yu
Comments: PhD thesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[509]  arXiv:2405.16701 [pdf, other]
Title: Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Comments: Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[510]  arXiv:2405.16700 [pdf, other]
Title: Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Comments: Project page: this https URL 37 Pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[511]  arXiv:2405.16683 [pdf, other]
Title: Toward Digitalization: A Secure Approach to Find a Missing Person Using Facial Recognition Technology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
[512]  arXiv:2405.16645 [pdf, other]
Title: Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[513]  arXiv:2405.16628 [pdf, other]
Title: Competing for pixels: a self-play algorithm for weakly-supervised segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[514]  arXiv:2405.16625 [pdf, other]
Title: Few-shot Tuning of Foundation Models for Class-incremental Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[515]  arXiv:2405.16610 [pdf, other]
Title: The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol
Comments: Published in CVPR-NAS 2024 workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[516]  arXiv:2405.16605 [pdf, other]
Title: Demystify Mamba in Vision: A Linear Attention Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[517]  arXiv:2405.16600 [pdf, other]
Title: Image-Text-Image Knowledge Transferring for Lifelong Person Re-Identification with Hybrid Clothing States
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[518]  arXiv:2405.16597 [pdf, other]
Title: Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519]  arXiv:2405.16596 [pdf, other]
Title: Protect-Your-IP: Scalable Source-Tracing and Attribution against Personalized Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[520]  arXiv:2405.16591 [pdf, other]
Title: CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[521]  arXiv:2405.16580 [pdf, other]
Title: A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[522]  arXiv:2405.16573 [pdf, other]
Title: FRCNet Frequency and Region Consistency for Semi-supervised Medical Image Segmentation
Comments: MICCAI 2024 Early Accept
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523]  arXiv:2405.16570 [pdf, other]
Title: ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling
Comments: Explore our 3D results at: this https URL ; fixed broken url to project page
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[524]  arXiv:2405.16555 [pdf, other]
Title: vHeat: Building Vision Models upon Heat Conduction
Comments: 18 pages, 10 figures, 9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525]  arXiv:2405.16544 [pdf, other]
Title: Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians
Comments: 21 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2405.16538 [pdf, other]
Title: Gamified AI Approch for Early Detection of Dementia
Comments: 50 Pages, 29 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[527]  arXiv:2405.16537 [pdf, other]
Title: I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Comments: 19 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[528]  arXiv:2405.16534 [pdf, other]
Title: Pruning for Robust Concept Erasing in Diffusion Models
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[529]  arXiv:2405.16517 [pdf, other]
Title: Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Comments: 18 pages, 10 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[530]  arXiv:2405.16501 [pdf, other]
Title: User-Friendly Customized Generation with Multi-Modal Prompts
Comments: 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531]  arXiv:2405.16496 [pdf, other]
Title: Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[532]  arXiv:2405.16493 [pdf, other]
Title: Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[533]  arXiv:2405.16488 [pdf, ps, other]
Title: Partial train and isolate, mitigate backdoor attack
Authors: Yong Li, Han Gao
Comments: 9 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534]  arXiv:2405.16486 [pdf, other]
Title: Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[535]  arXiv:2405.16479 [pdf, other]
Title: Differentiable Proximal Graph Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[536]  arXiv:2405.16478 [pdf, other]
Title: Vision-Based Approach for Food Weight Estimation from 2D Images
Comments: Six pages, Six figures, The final version of this paper is published in IEEE Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[537]  arXiv:2405.16473 [pdf, other]
Title: M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
Comments: Accepted at ACL2024 Main Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[538]  arXiv:2405.16470 [pdf, other]
Title: Image Deraining with Frequency-Enhanced State Space Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[539]  arXiv:2405.16451 [pdf, other]
Title: From Macro to Micro: Boosting micro-expression recognition via pre-training on macro-expression videos
Comments: 18 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540]  arXiv:2405.16443 [pdf, other]
Title: 3D View Optimization for Improving Image Aesthetics
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[541]  arXiv:2405.16437 [pdf, other]
Title: Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[542]  arXiv:2405.16426 [pdf, other]
Title: Segmentation of Maya hieroglyphs through fine-tuned foundation models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[543]  arXiv:2405.16419 [pdf, other]
Title: Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[544]  arXiv:2405.16417 [pdf, other]
Title: CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Comments: Accepted by ICML2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2405.16414 [pdf, other]
Title: PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model
Comments: 9 content pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2405.16401 [pdf, other]
Title: Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[547]  arXiv:2405.16393 [pdf, other]
Title: Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[548]  arXiv:2405.16382 [pdf, other]
Title: Video Prediction Models as General Visual Encoders
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[549]  arXiv:2405.16341 [pdf, other]
Title: R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[550]  arXiv:2405.16330 [pdf, other]
Title: LEAST: "Local" text-conditioned image style transfer
Comments: Accepted to AI for Content Creation (AI4CC) Workshop at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[551]  arXiv:2405.16328 [pdf, other]
Title: A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[552]  arXiv:2405.16301 [pdf, other]
Title: Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[553]  arXiv:2405.16296 [pdf, other]
Title: Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video
Authors: Jhen Hsieh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[554]  arXiv:2405.16273 [pdf, other]
Title: M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation
Comments: 18 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[555]  arXiv:2405.16263 [pdf, other]
Title: Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[556]  arXiv:2405.16260 [pdf, other]
Title: Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[557]  arXiv:2405.16234 [pdf, other]
Title: Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[558]  arXiv:2405.16226 [pdf, other]
Title: Detecting Adversarial Data via Perturbation Forgery
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[559]  arXiv:2405.16220 [pdf, ps, other]
Title: DAFFNet: A Dual Attention Feature Fusion Network for Classification of White Blood Cells
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[560]  arXiv:2405.16214 [pdf, other]
Title: Underwater Image Enhancement by Diffusion Model with Customized CLIP-Classifier
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[561]  arXiv:2405.16213 [pdf, other]
Title: Learning Visual-Semantic Subspace Representations for Propositional Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[562]  arXiv:2405.16204 [pdf, other]
Title: VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[563]  arXiv:2405.16200 [pdf, other]
Title: FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[564]  arXiv:2405.16197 [pdf, other]
Title: A 7K Parameter Model for Underwater Image Enhancement based on Transmission Map Prior
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[565]  arXiv:2405.16181 [pdf, other]
Title: Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[566]  arXiv:2405.16152 [pdf, other]
Title: SuDA: Support-based Domain Adaptation for Sim2Real Motion Capture with Flexible Sensors
Comments: 20 pages conference, accepted ICML paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[567]  arXiv:2405.16146 [pdf, other]
Title: Dual-Adapter: Training-free Dual Adaptation for Few-shot Out-of-Distribution Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568]  arXiv:2405.16144 [pdf, other]
Title: GreenCOD: A Green Camouflaged Object Detection Method
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[569]  arXiv:2405.16134 [pdf, other]
Title: Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570]  arXiv:2405.16116 [pdf, other]
Title: Real-Time Scene Graph Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571]  arXiv:2405.16108 [pdf, other]
Title: OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[572]  arXiv:2405.16105 [pdf, other]
Title: MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[573]  arXiv:2405.16099 [pdf, other]
Title: Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation
Comments: 5 pages, 3 figures, accepted by IEEE CAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[574]  arXiv:2405.16098 [pdf, other]
Title: Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575]  arXiv:2405.16096 [pdf, other]
Title: MINet: Multi-scale Interactive Network for Real-time Salient Object Detection of Strip Steel Surface Defects
Comments: accepted by IEEE Transactions on Industrial Informatics
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[576]  arXiv:2405.16094 [pdf, other]
Title: PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2405.16093 [pdf, other]
Title: Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under Class Mismatch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[578]  arXiv:2405.16091 [pdf, other]
Title: Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[579]  arXiv:2405.16085 [pdf, other]
Title: Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration
Comments: 22 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[580]  arXiv:2405.16082 [pdf, ps, other]
Title: Uncertainty Measurement of Deep Learning System based on the Convex Hull of Training Sets
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[581]  arXiv:2405.16071 [pdf, other]
Title: DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Comments: Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[582]  arXiv:2405.16038 [pdf, other]
Title: Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583]  arXiv:2405.16034 [pdf, other]
Title: DiffuBox: Refining 3D Object Detection with Point Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[584]  arXiv:2405.16016 [pdf, other]
Title: ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[585]  arXiv:2405.16009 [pdf, other]
Title: Streaming Long Video Understanding with Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2405.16008 [pdf, other]
Title: Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality
Comments: International Workshop on Frontiers of Computer Vision (IW-FCV2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[587]  arXiv:2405.16005 [pdf, other]
Title: PTQ4DiT: Post-training Quantization for Diffusion Transformers
Comments: 12 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[588]  arXiv:2405.15996 [pdf, other]
Title: Selfie Taking with Facial Expression Recognition Using Omni-directional Camera
Comments: International Workshop on Frontiers of Computer Vision (IW-FCV2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[589]  arXiv:2405.15995 [pdf, other]
Title: Efficient Temporal Action Segmentation via Boundary-aware Query Voting
Comments: 17 pages, 8 figures, 11 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[590]  arXiv:2405.15989 [pdf, other]
Title: TreeFormers -- An Exploration of Vision Transformers for Deforestation Driver Classification
Authors: Uche Ochuba
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[591]  arXiv:2405.15976 [pdf, other]
Title: Understanding the Impact of Training Set Size on Animal Re-identification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Populations and Evolution (q-bio.PE)
[592]  arXiv:2405.15973 [pdf, other]
Title: Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Comments: 15 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[593]  arXiv:2405.15965 [pdf, other]
Title: What is a Goldilocks Face Verification Test Set?
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[594]  arXiv:2405.15962 [pdf, ps, other]
Title: Wearable-based behaviour interpolation for semi-supervised human activity recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[595]  arXiv:2405.15961 [pdf, other]
Title: Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
Comments: Accepted at the 3rd CVPR Workshop on Vision Datasets Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[596]  arXiv:2405.15953 [pdf, other]
Title: Activator: GLU Activations as The Core Functions of a Vision Transformer
Comments: arXiv admin note: substantial text overlap with arXiv:2403.02411
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[597]  arXiv:2405.15939 [pdf, other]
Title: Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[598]  arXiv:2405.15932 [pdf, other]
Title: Steerable Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599]  arXiv:2405.15916 [pdf, other]
Title: Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Comments: Accepted to International Conference on Robotics and Automation(ICRA) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[600]  arXiv:2405.15914 [pdf, other]
Title: ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[601]  arXiv:2405.15891 [pdf, other]
Title: Score Distillation via Reparametrized DDIM
Comments: Preprint. Some browsers might incorrectly display images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[602]  arXiv:2405.15886 [pdf, other]
Title: A Neurosymbolic Framework for Bias Correction in CNNs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[603]  arXiv:2405.15881 [pdf, other]
Title: Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[604]  arXiv:2405.15860 [pdf, other]
Title: Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[605]  arXiv:2405.15843 [pdf, other]
Title: SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[606]  arXiv:2405.15827 [pdf, other]
Title: Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing
Authors: Dening Lu, Jun Zhou, Kyle (Yilin)Gao, Linlin Xu, Jonathan Li
Comments: 16 pages, 12 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[607]  arXiv:2405.15826 [pdf, other]
Title: 3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation
Comments: 13 pages, 10 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[608]  arXiv:2405.15817 [pdf, other]
Title: Rethinking the Elementary Function Fusion for Single-Image Dehazing
Authors: Yesian Rohn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[609]  arXiv:2405.15813 [pdf, other]
Title: From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Comments: 23 pages, 5 figures and 3 Tables. To appear in ACM Trans. Multimedia Comput. Commun. Appl.(TOMM) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[610]  arXiv:2405.15780 [pdf, other]
Title: Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[611]  arXiv:2405.17416 (cross-list from cs.LG) [pdf, other]
Title: A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Comments: Accepted at RLC 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[612]  arXiv:2405.17401 (cross-list from cs.LG) [pdf, other]
Title: RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control
Comments: Preprint. Under review
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[613]  arXiv:2405.17278 (cross-list from cs.RO) [pdf, ps, other]
Title: EF-Calib: Spatiotemporal Calibration of Event- and Frame-Based Cameras Using Continuous-Time Trajectories
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[614]  arXiv:2405.17267 (cross-list from cs.LG) [pdf, other]
Title: FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation
Comments: 35 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[615]  arXiv:2405.17261 (cross-list from eess.IV) [pdf, other]
Title: Does Diffusion Beat GAN in Image Super Resolution?
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[616]  arXiv:2405.17260 (cross-list from cs.LG) [pdf, other]
Title: Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
[617]  arXiv:2405.17257 (cross-list from cs.CG) [pdf, other]
Title: Surface reconstruction of sampled textiles via Morse theory
Comments: 40 pages, 17 figures, 1 table, 1 algorithm, 1 appendix
Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Algebraic Topology (math.AT)
[618]  arXiv:2405.17181 (cross-list from cs.LG) [pdf, other]
Title: Spectral regularization for adversarially-robust representation learning
Comments: 15 + 15 pages, 8 + 11 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[619]  arXiv:2405.17167 (cross-list from eess.IV) [pdf, ps, other]
Title: Partitioned Hankel-based Diffusion Models for Few-shot Low-dose CT Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[620]  arXiv:2405.17141 (cross-list from eess.IV) [pdf, other]
Title: MVMS-RCN: A Dual-Domain Unfolding CT Reconstruction with Multi-sparse-view and Multi-scale Refinement-correction
Comments: 12 pages, submitted
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[621]  arXiv:2405.17116 (cross-list from cs.CL) [pdf, ps, other]
Title: Mixtures of Unsupervised Lexicon Classification
Comments: A draft on lexicon classification unsupervised learning. It shows that aggregating lexicon scores is equivalent to a finite mixture of multinomial Naive Bayes models. A very preliminary work of a few days man-hours, like a weekly report/note, but might be useful
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[622]  arXiv:2405.17029 (cross-list from eess.IV) [pdf, other]
Title: Multi-view Disparity Estimation Using a Novel Gradient Consistency Model
Comments: 11 pages, 11 figures. Submitted to Transactions on Image Processing
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[623]  arXiv:2405.16994 (cross-list from cs.AI) [pdf, other]
Title: Vision-and-Language Navigation Generative Pretrained Transformer
Authors: Wen Hanlin
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[624]  arXiv:2405.16942 (cross-list from eess.IV) [pdf, other]
Title: PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[625]  arXiv:2405.16932 (cross-list from cs.RO) [pdf, other]
Title: CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy
Comments: 10 pages, 10 figures, 6 tables, under revision
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[626]  arXiv:2405.16888 (cross-list from cs.GR) [pdf, other]
Title: Part123: Part-aware 3D Reconstruction from a Single-view Image
Comments: Accepted to SIGGRAPH 2024 (conference track),webpage: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[627]  arXiv:2405.16850 (cross-list from eess.IV) [pdf, other]
Title: UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[628]  arXiv:2405.16751 (cross-list from cs.AI) [pdf, other]
Title: LLM-Based Cooperative Agents using Information Relevance and Plan Validation
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[629]  arXiv:2405.16749 (cross-list from cs.LG) [pdf, other]
Title: DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[630]  arXiv:2405.16692 (cross-list from cs.RO) [pdf, other]
Title: Planning Robot Placement for Object Grasping
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[631]  arXiv:2405.16640 (cross-list from cs.AI) [pdf, other]
Title: A Survey of Multimodal Large Language Model from A Data-centric Perspective
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[632]  arXiv:2405.16559 (cross-list from cs.RO) [pdf, other]
Title: Map-based Modular Approach for Zero-shot Embodied Question Answering
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[633]  arXiv:2405.16516 (cross-list from eess.IV) [pdf, other]
Title: Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models
Comments: Provisionally accepted for medical image computing and computer-assisted intervention (MICCAI) 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[634]  arXiv:2405.16475 (cross-list from cs.LG) [pdf, other]
Title: Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[635]  arXiv:2405.16464 (cross-list from cs.RO) [pdf, other]
Title: Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge
Comments: Accepted by CVPR 2024 workshop. The 1st winning model in CVPR 2024 UG2+ challenge. The code and configuration of our method are available at this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[636]  arXiv:2405.16460 (cross-list from cs.LG) [pdf, other]
Title: Probabilistic Contrastive Learning with Explicit Concentration on the Hypersphere
Comments: technical report
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[637]  arXiv:2405.16418 (cross-list from cs.LG) [pdf, other]
Title: Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[638]  arXiv:2405.16406 (cross-list from cs.LG) [pdf, other]
Title: SpinQuant: LLM quantization with learned rotations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[639]  arXiv:2405.16343 (cross-list from eess.IV) [pdf, other]
Title: Learning Point Spread Function Invertibility Assessment for Image Deconvolution
Comments: Accepted at EUSIPCO 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[640]  arXiv:2405.16277 (cross-list from cs.CL) [pdf, other]
Title: Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge
Comments: 9 pages (excluding references), accepted to ACL 2024 Main Conference
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[641]  arXiv:2405.16248 (cross-list from eess.IV) [pdf, ps, other]
Title: Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[642]  arXiv:2405.16235 (cross-list from eess.IV) [pdf, ps, other]
Title: A better approach to diagnose retinal diseases: Combining our Segmentation-based Vascular Enhancement with deep learning features
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[643]  arXiv:2405.16114 (cross-list from cs.AI) [pdf, other]
Title: Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[644]  arXiv:2405.16112 (cross-list from cs.CR) [pdf, other]
Title: Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor
Comments: 13 pages, 5 figures and 5 tables
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[645]  arXiv:2405.16102 (cross-list from eess.IV) [pdf, other]
Title: Reliable Source Approximation: Source-Free Unsupervised Domain Adaptation for Vestibular Schwannoma MRI Segmentation
Comments: Early accepted by MICCAI 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[646]  arXiv:2405.16036 (cross-list from cs.LG) [pdf, other]
Title: Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[647]  arXiv:2405.15971 (cross-list from cs.LG) [pdf, other]
Title: Robust width: A lightweight and certifiable adversarial defense
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[648]  arXiv:2405.15925 (cross-list from eess.IV) [pdf, ps, other]
Title: MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation
Comments: 11 pages, 8 figures, journal paper (under review)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[649]  arXiv:2405.15779 (cross-list from eess.IV) [pdf, ps, other]
Title: LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation
Comments: 35 pages, 9 figures, 10 tables
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[650]  arXiv:2405.15778 (cross-list from eess.IV) [pdf, other]
Title: Investigation of Energy-efficient AI Model Architectures and Compression Techniques for "Green" Fetal Brain Segmentation
Comments: Submitted to International Conference on Computational Science (ICCS) 2024
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Performance (cs.PF)

Mon, 27 May 2024

[651]  arXiv:2405.15769 [pdf, other]
Title: FastDrag: Manipulate Anything in One Step
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[652]  arXiv:2405.15763 [pdf, other]
Title: FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[653]  arXiv:2405.15758 [pdf, other]
Title: InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[654]  arXiv:2405.15757 [pdf, other]
Title: Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[655]  arXiv:2405.15755 [pdf, other]
Title: ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking
Comments: 16 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[656]  arXiv:2405.15738 [pdf, other]
Title: ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Comments: 17 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[657]  arXiv:2405.15734 [pdf, other]
Title: LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[658]  arXiv:2405.15728 [pdf, other]
Title: Disease-informed Adaptation of Vision-Language Models
Comments: Early Accepted by MICCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[659]  arXiv:2405.15719 [pdf, other]
Title: Hierarchical Uncertainty Exploration via Feedforward Posterior Trees
Comments: 32 pages, 21 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[660]  arXiv:2405.15700 [pdf, other]
Title: Trackastra: Transformer-based cell tracking for live-cell microscopy
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[661]  arXiv:2405.15688 [pdf, other]
Title: UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[662]  arXiv:2405.15687 [pdf, other]
Title: Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
Comments: Accepted to ICME 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[663]  arXiv:2405.15684 [pdf, other]
Title: Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[664]  arXiv:2405.15683 [pdf, other]
Title: VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap
Comments: Preprint. Under review. Code will be released on paper acceptance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[665]  arXiv:2405.15668 [pdf, other]
Title: What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[666]  arXiv:2405.15661 [pdf, other]
Title: Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables
Comments: 10 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[667]  arXiv:2405.15660 [pdf, other]
Title: Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[668]  arXiv:2405.15658 [pdf, other]
Title: HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[669]  arXiv:2405.15638 [pdf, other]
Title: M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Comments: Work in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[670]  arXiv:2405.15636 [pdf, other]
Title: Visualize and Paint GAN Activations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[671]  arXiv:2405.15633 [pdf, other]
Title: Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning
Comments: Published at 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[672]  arXiv:2405.15622 [pdf, other]
Title: LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Comments: 19 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[673]  arXiv:2405.15619 [pdf, other]
Title: DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[674]  arXiv:2405.15596 [pdf, other]
Title: Multimodal Object Detection via Probabilistic a priori Information Integration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[675]  arXiv:2405.15587 [pdf, other]
Title: Composed Image Retrieval for Remote Sensing
Comments: Accepted for ORAL presentation at the 2024 IEEE International Geoscience and Remote Sensing Symposium
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[676]  arXiv:2405.15580 [pdf, other]
Title: Open-Vocabulary SAM3D: Understand Any 3D Scene
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[677]  arXiv:2405.15574 [pdf, other]
Title: Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Comments: Code is available in this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[678]  arXiv:2405.15567 [pdf, other]
Title: PyCellMech: A shape-based feature extraction pipeline for use in medical and biological studies
Comments: 5 pages, 1 figure, 1 table, in submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[679]  arXiv:2405.15563 [pdf, ps, other]
Title: Heterogeneous virus classification using a functional deep learning model based on transmission electron microscopy images (Preprint)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[680]  arXiv:2405.15550 [pdf, ps, other]
Title: CowScreeningDB: A public benchmark dataset for lameness detection in dairy cows
Journal-ref: Computers and Electronics in Agriculture, vol.216, pp.108500, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[681]  arXiv:2405.15549 [pdf, other]
Title: SEP: Self-Enhanced Prompt Tuning for Visual-Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[682]  arXiv:2405.15541 [pdf, other]
Title: Learning Generalizable Human Motion Generator with Reinforcement Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[683]  arXiv:2405.15524 [pdf, other]
Title: Polyp Segmentation Generalisability of Pretrained Backbones
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[684]  arXiv:2405.15518 [pdf, other]
Title: Feature Splatting for Better Novel View Synthesis with Low Overlap
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[685]  arXiv:2405.15491 [pdf, other]
Title: GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting
Comments: For project page, see this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[686]  arXiv:2405.15477 [pdf, other]
Title: MagicBathyNet: A Multimodal Remote Sensing Dataset for Bathymetry Prediction and Pixel-based Classification in Shallow Waters
Comments: 5 pages, 3 figures, 5 tables. Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[687]  arXiv:2405.15475 [pdf, other]
Title: Efficient Degradation-aware Any Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[688]  arXiv:2405.15468 [pdf, other]
Title: Semantic Aware Diffusion Inverse Tone Mapping
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[689]  arXiv:2405.15465 [pdf, other]
Title: Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[690]  arXiv:2405.15463 [pdf, other]
Title: PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
Comments: 14 pages, 4 figures, 6 tables, NIPS submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[691]  arXiv:2405.15451 [pdf, other]
Title: Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
Comments: ICASSP 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[692]  arXiv:2405.15439 [pdf, other]
Title: Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[693]  arXiv:2405.15438 [pdf, other]
Title: Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[694]  arXiv:2405.15434 [pdf, other]
Title: Biometrics and Behavioral Modelling for Detecting Distractions in Online Learning
Comments: Accepted in CEDI 2024 (VII Congreso Espa\~nol de Inform\'atica), A Coru\~na, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[695]  arXiv:2405.15428 [pdf, other]
Title: Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[696]  arXiv:2405.15405 [pdf, other]
Title: Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification
Comments: Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024. Our code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[697]  arXiv:2405.15395 [pdf, other]
Title: Fieldscale: Locality-Aware Field-based Adaptive Rescaling for Thermal Infrared Image
Comments: 9 pages, 8 figures, accepted to RA-L
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[698]  arXiv:2405.15394 [pdf, other]
Title: Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets
Comments: Accepted for oral presentation at IGARSS 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[699]  arXiv:2405.15385 [pdf, other]
Title: CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[700]  arXiv:2405.15365 [pdf, other]
Title: U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[701]  arXiv:2405.15364 [pdf, other]
Title: NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Comments: Technical Report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[702]  arXiv:2405.15356 [pdf, other]
Title: Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[703]  arXiv:2405.15343 [pdf, other]
Title: Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[704]  arXiv:2405.15330 [pdf, other]
Title: Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[705]  arXiv:2405.15321 [pdf, other]
Title: SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[706]  arXiv:2405.15313 [pdf, other]
Title: Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[707]  arXiv:2405.15311 [pdf, other]
Title: Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[708]  arXiv:2405.15305 [pdf, other]
Title: Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Comments: Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[709]  arXiv:2405.15299 [pdf, other]
Title: Transparent Object Depth Completion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[710]  arXiv:2405.15289 [pdf, other]
Title: Learning Invariant Causal Mechanism from Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[711]  arXiv:2405.15287 [pdf, other]
Title: StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[712]  arXiv:2405.15286 [pdf, other]
Title: 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving
Comments: 25 pages, 6 figures, codes are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[713]  arXiv:2405.15279 [pdf, other]
Title: Towards Global Optimal Visual In-Context Learning Prompt Selection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[714]  arXiv:2405.15278 [pdf, other]
Title: MindShot: Brain Decoding Framework Using Only One Image
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[715]  arXiv:2405.15274 [pdf, other]
Title: Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[716]  arXiv:2405.15269 [pdf, other]
Title: BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[717]  arXiv:2405.15267 [pdf, other]
Title: Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[718]  arXiv:2405.15265 [pdf, other]
Title: Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[719]  arXiv:2405.15264 [pdf, other]
Title: Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[720]  arXiv:2405.15253 [pdf, other]
Title: Seeing the World through an Antenna's Eye: Reception Quality Visualization Using Incomplete Technical Signal Information
Authors: Leif Bergerhoff
Comments: 5 pages, to be published in the conference proceedings of the European Signal Processing Conference (EUSIPCO) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA); Data Analysis, Statistics and Probability (physics.data-an)
[721]  arXiv:2405.15243 [pdf, other]
Title: Less is More: Discovering Concise Network Explanations
Comments: 9 pages, 5 figures; ICLR Re-Align Workshop 2024; Project Page: this https URL; Github: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[722]  arXiv:2405.15239 [pdf, other]
Title: Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation
Comments: 25 pages, 16 figures, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[723]  arXiv:2405.15234 [pdf, other]
Title: Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
Comments: Codes are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[724]  arXiv:2405.15232 [pdf, other]
Title: DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Comments: 25 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[725]  arXiv:2405.15225 [pdf, other]
Title: Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[726]  arXiv:2405.15223 [pdf, other]
Title: iVideoGPT: Interactive VideoGPTs are Scalable World Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[727]  arXiv:2405.15222 [pdf, other]
Title: Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[728]  arXiv:2405.15217 [pdf, other]
Title: NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[729]  arXiv:2405.15214 [pdf, other]
Title: PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[730]  arXiv:2405.15209 [pdf, other]
Title: Unsupervised Motion Segmentation for Neuromorphic Aerial Surveillance
Comments: 31 pages, 11 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[731]  arXiv:2405.15203 [pdf, other]
Title: Exploring the Impact of Synthetic Data for Aerial-view Human Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[732]  arXiv:2405.15199 [pdf, other]
Title: ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[733]  arXiv:2405.15196 [pdf, other]
Title: DisC-GS: Discontinuity-aware Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[734]  arXiv:2405.15188 [pdf, other]
Title: PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[735]  arXiv:2405.15176 [pdf, other]
Title: MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[736]  arXiv:2405.15173 [pdf, other]
Title: A3:Ambiguous Aberrations Captured via Astray-Learning for Facial Forgery Semantic Sublimation
Comments: 19 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[737]  arXiv:2405.15170 [pdf, other]
Title: Label-efficient Semantic Scene Completion with Scribble Annotations
Comments: Accepted by IJCAI2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[738]  arXiv:2405.15169 [pdf, other]
Title: Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Comments: 11 pages,7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[739]  arXiv:2405.15160 [pdf, other]
Title: ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[740]  arXiv:2405.15157 [pdf, other]
Title: Rethinking Class-Incremental Learning from a Dynamic Imbalanced Learning Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[741]  arXiv:2405.15155 [pdf, other]
Title: CLIP model is an Efficient Online Lifelong Learner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[742]  arXiv:2405.15151 [pdf, other]
Title: NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[743]  arXiv:2405.15137 [pdf, other]
Title: An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[744]  arXiv:2405.15125 [pdf, other]
Title: HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting
Comments: The first 3D Gaussian Splatting-based method for HDR imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[745]  arXiv:2405.15118 [pdf, other]
Title: GS-Hider: Hiding Messages into 3D Gaussian Splatting
Comments: 3DGS steganography
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[746]  arXiv:2405.15033 [pdf, other]
Title: Generating camera failures as a class of physics-based adversarial examples
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[747]  arXiv:2405.15020 [pdf, other]
Title: AdjointDEIS: Efficient Gradients for Diffusion Models
Comments: Initial pre-print
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[748]  arXiv:2405.14986 [pdf, ps, other]
Title: Hand bone age estimation using divide and conquer strategy and lightweight convolutional neural networks
Journal-ref: Engineering Applications of Artificial Intelligence, Volume 120, 2023, 105935, ISSN 0952-1976
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[749]  arXiv:2405.14977 [pdf, other]
Title: A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models
Comments: Accepted at CVPR 2024 MAT Workshop Community Track
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[750]  arXiv:2405.14974 [pdf, other]
Title: LOVA3: Learning to Visual Question Answering, Asking and Assessment
Comments: The code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[751]  arXiv:2405.14961 [pdf, other]
Title: SFDDM: Single-fold Distillation for Diffusion models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[752]  arXiv:2405.14959 [pdf, other]
Title: EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[753]  arXiv:2405.14883 [pdf, other]
Title: Spectral Image Data Fusion for Multisource Data Augmentation
Comments: 32 pages, in review process
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
[754]  arXiv:2405.14882 [pdf, other]
Title: LookUp3D: Data-Driven 3D Scanning
Comments: 10 pages and 4 ancillary files
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Image and Video Processing (eess.IV)
[755]  arXiv:2405.14881 [pdf, other]
Title: DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Comments: Accepted at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[756]  arXiv:2405.14880 [pdf, other]
Title: Dissecting Query-Key Interaction in Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[757]  arXiv:2405.14879 [pdf, ps, other]
Title: Automatic Coral Detection with YOLO: A Deep Learning Approach for Efficient and Accurate Coral Reef Monitoring
Authors: Ouassine Younes (LISI, Computer Science Department), Zahir Jihad (LISI, Computer Science Department), Conruyt Noël (LIM), Kayal Mohsen (ENTROPIE (Nouvelle-Calédonie)), A. Martin Philippe (LIM), Chenin Eric (UMMISCO), Bigot Lionel (ENTROPIE (Réunion)), Vignes Lebbe Regine (ISYEB)
Journal-ref: ECAI 2023 International Workshops, Sep 2023, Krak{\'o}w, France. pp.170-177
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[758]  arXiv:2405.14877 [pdf, other]
Title: Visual Deformation Detection Using Soft Material Simulation for Pre-training of Condition Assessment Models
Comments: 6 pages, 4 figures, submitted to CASE
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[759]  arXiv:2405.14876 [pdf, other]
Title: Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[760]  arXiv:2405.14874 [pdf, other]
Title: Investigating Robustness of Open-Vocabulary Foundation Object Detectors under Distribution Shifts
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[761]  arXiv:2405.15766 (cross-list from cs.AI) [pdf, other]
Title: Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
Comments: ACL Findings 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[762]  arXiv:2405.15677 (cross-list from cs.RO) [pdf, other]
Title: SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[763]  arXiv:2405.15664 (cross-list from cs.RO) [pdf, other]
Title: GroundGrid:LiDAR Point Cloud Ground Segmentation and Terrain Estimation
Comments: This letter has been accepted for publication in IEEE Robotics and Automation Letters
Journal-ref: IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 420-426, Jan. 2024
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[764]  arXiv:2405.15613 (cross-list from cs.LG) [pdf, other]
Title: Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[765]  arXiv:2405.15517 (cross-list from eess.IV) [pdf, other]
Title: Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction
Comments: The paper is accpeted by MIDL 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[766]  arXiv:2405.15500 (cross-list from eess.IV) [pdf, other]
Title: Hierarchical Loss And Geometric Mask Refinement For Multilabel Ribs Segmentation
Comments: Accepted to IEEE ISBI 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[767]  arXiv:2405.15476 (cross-list from cs.LG) [pdf, other]
Title: Editable Concept Bottleneck Models
Comments: 33 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[768]  arXiv:2405.15442 (cross-list from eess.IV) [pdf, other]
Title: Towards Precision Healthcare: Robust Fusion of Time Series and Image Data
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[769]  arXiv:2405.15425 (cross-list from cs.GR) [pdf, other]
Title: Volumetric Primitives for Modeling and Rendering Scattering and Emissive Media
Comments: 17 pages, 10 figures
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[770]  arXiv:2405.15413 (cross-list from eess.IV) [pdf, other]
Title: MambaVC: Learned Visual Compression with Selective State Spaces
Comments: 17pages,15 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
[771]  arXiv:2405.15398 (cross-list from cs.CE) [pdf, other]
Title: PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds
Comments: Acccepted at Europar 2024
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
[772]  arXiv:2405.15341 (cross-list from cs.AI) [pdf, other]
Title: V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[773]  arXiv:2405.15324 (cross-list from cs.RO) [pdf, other]
Title: Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Comments: 23 pages, 16 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[774]  arXiv:2405.15304 (cross-list from cs.LG) [pdf, other]
Title: Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[775]  arXiv:2405.15275 (cross-list from eess.IV) [pdf, other]
Title: NMGrad: Advancing Histopathological Bladder Cancer Grading with Weakly Supervised Deep Learning
Comments: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
[776]  arXiv:2405.15241 (cross-list from eess.IV) [pdf, other]
Title: Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[777]  arXiv:2405.15240 (cross-list from cs.LG) [pdf, other]
Title: Towards Real World Debiasing: A Fine-grained Analysis On Spurious Correlation
Comments: 9 pages of main paper, 10 pages of appendix
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[778]  arXiv:2405.15228 (cross-list from cs.LG) [pdf, other]
Title: Learning from True-False Labels via Multi-modal Prompt Retrieving
Comments: 15 pages, 4 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[779]  arXiv:2405.15205 (cross-list from eess.IV) [pdf, other]
Title: Enhancing Generalized Fetal Brain MRI Segmentation using A Cascade Network with Depth-wise Separable Convolution and Attention Mechanism
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[780]  arXiv:2405.15161 (cross-list from cs.CR) [pdf, other]
Title: Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark
Comments: 11 pages, 7 figures,
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[781]  arXiv:2405.15127 (cross-list from eess.IV) [pdf, other]
Title: Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images
Comments: 4 pages, 3 figures, to be published in the 2024 IEEE International Symposium on Biomedical Imaging (ISBI) proceedings
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[782]  arXiv:2405.15098 (cross-list from eess.IV) [pdf, ps, other]
Title: Magnetic Resonance Image Processing Transformer for General Reconstruction
Comments: 29 pages, 3 figures, 3 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
[783]  arXiv:2405.15083 (cross-list from cs.AI) [pdf, other]
Title: MuDreamer: Learning Predictive World Models without Reconstruction
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[784]  arXiv:2405.15056 (cross-list from cs.LG) [pdf, other]
Title: ElastoGen: 4D Generative Elastodynamics
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[785]  arXiv:2405.15018 (cross-list from cs.LG) [pdf, other]
Title: What Variables Affect Out-Of-Distribution Generalization in Pretrained Models?
Comments: Preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[786]  arXiv:2405.14979 (cross-list from cs.GR) [pdf, other]
Title: CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
Comments: HomePage: this https URL, Code: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[787]  arXiv:2405.14934 (cross-list from eess.IV) [pdf, other]
Title: Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[788]  arXiv:2405.14900 (cross-list from eess.IV) [pdf, other]
Title: Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning Challenge
Authors: Kendall Schmidt (American College of Radiology, USA), Benjamin Bearce (The Massachusetts General Hospital, USA and University of Colorado, USA), Ken Chang (The Massachusetts General Hospital), Laura Coombs (American College of Radiology, USA), Keyvan Farahani (National Institutes of Health National Cancer Institute, USA), Marawan Elbatele (Computer Vision and Robotics Institute, University of Girona, Spain), Kaouther Mouhebe (Computer Vision and Robotics Institute, University of Girona, Spain), Robert Marti (Computer Vision and Robotics Institute, University of Girona, Spain), Ruipeng Zhang (Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China and Shanghai AI Laboratory, China), Yao Zhang (Shanghai AI Laboratory, China), Yanfeng Wang (Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China and Shanghai AI Laboratory, China), et al. (13 additional authors not shown)
Comments: 16 pages, 9 figures
Journal-ref: Medical Image Analysis Volume 95, July 2024, 103206
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[789]  arXiv:2405.14878 (cross-list from eess.IV) [pdf, other]
Title: Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Applications (stat.AP)
[790]  arXiv:2405.14875 (cross-list from eess.IV) [pdf, ps, other]
Title: BloodCell-Net: A lightweight convolutional neural network for the classification of all microscopic blood cell images of the human body
Comments: 24 pages, 7 tables and 13 Figures
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ total of 790 entries: 1-1000 | 312-790 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help  (Access key information)