Computer Vision and Pattern Recognition

Authors and titles for recent submissions

[ total of 593 entries: 1-224 | 225-448 | 449-593 ]
[ showing 224 entries per page: fewer | more | all ]

Fri, 26 Apr 2024

[1] arXiv:2404.16831 [pdf, other]: Title: The Third Monocular Depth Estimation Challenge

Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora, Pihai Sun, Kui Jiang, Gang Wu, Jian Liu, Xianming Liu, Junjun Jiang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan, Jiabao Wang, Albert Luginov, Muhammad Shahzad, Seyed Hosseini, Aleksander Trajcevski, James H. Elder

Comments: To appear in CVPRW2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2404.16829 [pdf, other]: Title: Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[3] arXiv:2404.16828 [pdf, other]: Title: Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

Authors: Charig Yang, Weidi Xie, Andrew Zisserman

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[4] arXiv:2404.16825 [pdf, other]: Title: ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[5] arXiv:2404.16824 [pdf, other]: Title: V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[6] arXiv:2404.16821 [pdf, other]: Title: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[7] arXiv:2404.16820 [pdf, other]: Title: Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Authors: Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh

Comments: Data and code will be released at: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2404.16818 [pdf, other]: Title: Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals

Authors: Oliver Hahn, Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

Comments: Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[9] arXiv:2404.16814 [pdf, other]: Title: Meta-Transfer Derm-Diagnosis: Exploring Few-Shot Learning and Transfer Learning for Skin Disease Classification in Long-Tail Distribution

Authors: Zeynep Özdemir, Hacer Yalim Keles, Ömer Özgür Tanrıöver

Comments: 17 pages, 5 figures, 6 tables, submitted to IEEE Journal of Biomedical and Health Informatics

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[10] arXiv:2404.16804 [pdf, other]: Title: AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Authors: Gahyeon Kim, Sohee Kim, Seokju Lee

Comments: Accepted to CVPR 2024 Workshop on Prompting in Vision, Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[11] arXiv:2404.16790 [pdf, other]: Title: SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Authors: Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2404.16781 [pdf, other]: Title: Registration by Regression (RbR): a framework for interpretable and flexible atlas registration

Authors: Karthik Gopinath, Xiaoling Hu, Malte Hoffmann, Oula Puonti, Juan Eugenio Iglesias

Comments: 11 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2404.16773 [pdf, other]: Title: ConKeD++ -- Improving descriptor learning for retinal image registration: A comprehensive study of contrastive losses

Authors: David Rivas-Villar, Álvaro S. Hervella, José Rouco, Jorge Novo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[14] arXiv:2404.16771 [pdf, other]: Title: ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Authors: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[15] arXiv:2404.16754 [pdf, other]: Title: RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[16] arXiv:2404.16752 [pdf, other]: Title: TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Authors: Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[17] arXiv:2404.16748 [pdf, other]: Title: TELA: Text to Layer-wise 3D Clothed Human Generation

Authors: Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2404.16739 [pdf, ps, other]: Title: CBRW: A Novel Approach for Cancelable Biometric Template Generation based on

Authors: Nitin Kumar, Manisha

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[19] arXiv:2404.16717 [pdf, other]: Title: Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

Authors: Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

Comments: Accepted to FAccT 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[20] arXiv:2404.16687 [pdf, other]: Title: NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[21] arXiv:2404.16685 [pdf, other]: Title: Multi-scale HSV Color Feature Embedding for High-fidelity NIR-to-RGB Spectrum Translation

Authors: Huiyu Zhai, Mo Chen, Xingxing Yang, Gusheng Kang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[22] arXiv:2404.16678 [pdf, other]: Title: Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

Authors: Han Wang, Xinning Chai, Yiwen Wang, Yuhong Zhang, Rong Xie, Li Song

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[23] arXiv:2404.16670 [pdf, other]: Title: EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Authors: Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[24] arXiv:2404.16666 [pdf, other]: Title: PhyRecon: Physically Plausible Neural Scene Reconstruction

Authors: Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

Comments: project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2404.16637 [pdf, other]: Title: Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data

Authors: Niclas Popp, Jan Hendrik Metzen, Matthias Hein

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[26] arXiv:2404.16635 [pdf, other]: Title: TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

Comments: 13 pages, 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[27] arXiv:2404.16633 [pdf, other]: Title: Self-Balanced R-CNN for Instance Segmentation

Authors: Leonardo Rossi, Akbar Karimi, Andrea Prati

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[28] arXiv:2404.16622 [pdf, other]: Title: DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Authors: Jer Pelhan, Alan Lukežič, Vitjan Zavrtanik, Matej Kristan

Comments: Accepted to CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[29] arXiv:2404.16617 [pdf, other]: Title: Denoising: from classical methods to deep CNNs

Authors: Jean-Eric Campagne

Comments: 33 pages, 33 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); History and Overview (math.HO)
[30] arXiv:2404.16612 [pdf, other]: Title: MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Authors: Chenxi Liu, Gan Sun, Wenqi Liang, Jiahua Dong, Can Qin, Yang Cong

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[31] arXiv:2404.16609 [pdf, other]: Title: SFMViT: SlowFast Meet ViT in Chaotic World

Authors: Jiaying Lin, Jiajun Wen, Mengyuan Liu, Jinfu Liu, Baiqiao Yin, Yue Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[32] arXiv:2404.16581 [pdf, other]: Title: AudioScenic: Audio-Driven Video Scene Editing

Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[33] arXiv:2404.16578 [pdf, other]: Title: Road Surface Friction Estimation for Winter Conditions Utilising General Visual Features

Authors: Risto Ojala, Eerik Alamikkotervo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:2404.16573 [pdf, other]: Title: Multi-Scale Representations by Varying Window Attention for Semantic Segmentation

Authors: Haotian Yan, Ming Wu, Chuang Zhang

Comments: ICLR2024 Poster

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[35] arXiv:2404.16571 [pdf, other]: Title: MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

Authors: Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Yitong Zhang, Xinxia Feng, Mei Liu, Qiang Li

Comments: 9 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[36] arXiv:2404.16561 [pdf, ps, other]: Title: Research on geometric figure classification algorithm based on Deep Learning

Authors: Ruiyang Wang, Haonan Wang, Junfeng Sun, Mingjia Zhao, Meng Liu

Comments: 6 pages,9 figures

Journal-ref: Scientific Journal of Intelligent Systems Research,Volume 4 Issue 6, 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[37] arXiv:2404.16558 [pdf, other]: Title: DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation

Authors: Leandro Di Bella, Yangxintong Lyu, Adrian Munteanu

Comments: 4 pages, 3 Figures, published to IET Electronic Letters

Journal-ref: Electronics Letters (ISSN: 00135194), jaar: 2024, volume: 60, nummer: 8, startpagina: ?

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[38] arXiv:2404.16557 [pdf, other]: Title: Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples

Authors: Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, Zhifeng Li

Comments: arXiv admin note: substantial text overlap with arXiv:2401.11170

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[39] arXiv:2404.16556 [pdf, other]: Title: Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models

Authors: Parul Gupta, Munawar Hayat, Abhinav Dhall, Thanh-Toan Do

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[40] arXiv:2404.16552 [pdf, other]: Title: Efficient Solution of Point-Line Absolute Pose

Authors: Petr Hruby, Timothy Duff, Marc Pollefeys

Comments: CVPR 2024, 11 pages, 8 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[41] arXiv:2404.16548 [pdf, other]: Title: Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System

Authors: Daniel Dworak, Mateusz Komorkiewicz, Paweł Skruch, Jerzy Baranowski

Comments: 12 pages including highlights and graphical abstract, submitted to Expert Systems with Applications journal

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[42] arXiv:2404.16538 [pdf, other]: Title: OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images

Authors: Ye Mao, Junpeng Jing, Krystian Mikolajczyk

Comments: 12 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[43] arXiv:2404.16536 [pdf, other]: Title: 3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

Authors: Guohao Li, Hongyu Yang, Di Huang, Yunhong Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[44] arXiv:2404.16507 [pdf, other]: Title: Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception

Authors: Xiaotong Yu, Chang-Wen Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[45] arXiv:2404.16501 [pdf, other]: Title: 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

Authors: Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang

Comments: arXiv admin note: substantial text overlap with arXiv:2403.12505

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[46] arXiv:2404.16493 [pdf, other]: Title: Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Authors: Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, Cheng Wang

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[47] arXiv:2404.16484 [pdf, other]: Title: Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu, Chengjian Zheng, Diankai Zhang, Ning Wang, Xintao Qiu, Yuanbo Zhou, Kongxian Wu, Xinwei Dai, Hui Tang, Wei Deng, Qingquan Gao, Tong Tong, Jae-Hyeon Lee, Ui-Jin Choi, Min Yan, Xin Liu, Qian Wang, Xiaoqian Ye, Zhan Du, Tiansen Zhang, Long Peng, Jiaming Guo, Xin Di, Bohao Liao, Zhibo Du, Peize Xia, Renjing Pei, Yang Wang, Yang Cao, Zhengjun Zha, Bingnan Han, Hongyuan Yu, Zhuoyuan Wu, Cheng Wan, Yuqing Liu, Haodong Yu, Jizhe Li, Zhijuan Huang, Yuan Huang, Yajun Zou, Xianyu Guan, et al. (10 additional authors not shown)

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[48] arXiv:2404.16474 [pdf, other]: Title: DiffSeg: A Segmentation Model for Skin Lesions Based on Diffusion Difference

Authors: Zhihao Shuai, Yinan Chen, Shunqiang Mao, Yihan Zho, Xiaohong Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[49] arXiv:2404.16471 [pdf, other]: Title: COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images

Authors: Panagiotis Sapoutzoglou, Georgios Giapitzakis Tzintanos, George Terzakis, Maria Pateraki

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[50] arXiv:2404.16456 [pdf, other]: Title: Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Authors: Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[51] arXiv:2404.16452 [pdf, other]: Title: PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Authors: Lihua Jing, Rui Wang, Wenqi Ren, Xin Dong, Cong Zou

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[52] arXiv:2404.16451 [pdf, other]: Title: Latent Modulated Function for Computational Optimal Continuous Image Representation

Authors: Zongyao He, Zhi Jin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[53] arXiv:2404.16432 [pdf, other]: Title: Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud

Authors: Ayumu Saito, Jiju Poovvancheri

Comments: 10 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[54] arXiv:2404.16429 [pdf, other]: Title: Depth Supervised Neural Surface Reconstruction from Airborne Imagery

Authors: Vincent Hackstein, Paul Fauth-Mayer, Matthias Rothermel, Norbert Haala

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[55] arXiv:2404.16423 [pdf, other]: Title: Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

Authors: Hongyu Yan, Yadong Mu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[56] arXiv:2404.16422 [pdf, other]: Title: Robust Fine-tuning for Pre-trained 3D Point Cloud Models

Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

Comments: 9 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[57] arXiv:2404.16421 [pdf, other]: Title: SynCellFactory: Generative Data Augmentation for Cell Tracking

Authors: Moritz Sturm, Lorenzo Cerrone, Fred A. Hamprecht

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[58] arXiv:2404.16416 [pdf, other]: Title: Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

Authors: Yu Wang, Sanping Zhou, Kun Xia, Le Wang

Comments: 10 pages, 6 figures, 6 tables, 56 conferences

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[59] arXiv:2404.16409 [pdf, other]: Title: Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series

Authors: Aimi Okabayashi (IRISA, OBELIX), Nicolas Audebert (CEDRIC - VERTIGO, CNAM, LaSTIG, IGN), Simon Donike (IPL), Charlotte Pelletier (OBELIX, IRISA)

Journal-ref: EARTHVISION 2024 IEEE/CVF CVPR Workshop. Large Scale Computer Vision for Remote Sensing Imagery, Jun 2024, Seattle, United States

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[60] arXiv:2404.16398 [pdf, other]: Title: Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval

Authors: Ryoya Nara, Yu-Chieh Lin, Yuji Nozawa, Youyang Ng, Goh Itoh, Osamu Torii, Yusuke Matsui

Comments: 20 pages, 8 sugures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[61] arXiv:2404.16386 [pdf, other]: Title: Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation

Authors: Zhimeng Zheng, Tao Huang, Gongsheng Li, Zuyi Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[62] arXiv:2404.16385 [pdf, other]: Title: Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

Authors: Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[63] arXiv:2404.16380 [pdf, ps, other]: Title: Efficient Higher-order Convolution for Small Kernels in Deep Learning

Authors: Zuocheng Wen, Lingzhong Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[64] arXiv:2404.16375 [pdf, other]: Title: List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Authors: An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Comments: Preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[65] arXiv:2404.16371 [pdf, other]: Title: Multimodal Information Interaction for Medical Image Segmentation

Authors: Xinxin Fan, Lin Liu, Haoran Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[66] arXiv:2404.16359 [pdf, other]: Title: An Improved Graph Pooling Network for Skeleton-Based Action Recognition

Authors: Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[67] arXiv:2404.16348 [pdf, other]: Title: Dual Expert Distillation Network for Generalized Zero-Shot Learning

Authors: Zhijie Rao, Jingcai Guo, Xiaocheng Lu, Jingming Liang, Jie Zhang, Haozhao Wang, Kang Wei, Xiaofeng Cao

Comments: 11 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[68] arXiv:2404.16339 [pdf, other]: Title: Training-Free Unsupervised Prompt for Vision-Language Models

Authors: Sifan Long, Linbin Wang, Zhen Zhao, Zichang Tan, Yiming Wu, Shengsheng Wang, Jingdong Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[69] arXiv:2404.16331 [pdf, other]: Title: IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

Authors: Zitong Huang, Ze Chen, Bowen Dong, Chaoqi Liang, Erjin Zhou, Wangmeng Zuo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[70] arXiv:2404.16325 [pdf, other]: Title: Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models

Authors: Hedda Cohen Indelman, Elay Dahan, Angeles M. Perez-Agosto, Carmit Shiran, Doron Shaked, Nati Daniel

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[71] arXiv:2404.16323 [pdf, other]: Title: DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction

Authors: Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[72] arXiv:2404.16306 [pdf, other]: Title: TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Authors: Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[73] arXiv:2404.16304 [pdf, other]: Title: BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

Authors: Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

Comments: ICME 2024, 11 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[74] arXiv:2404.16302 [pdf, other]: Title: CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen

Comments: The dataset and source code will be made publicly available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Image and Video Processing (eess.IV)
[75] arXiv:2404.16301 [pdf, other]: Title: Style Adaptation for Domain-adaptive Semantic Segmentation

Authors: Ting Li, Jianshu Chao, Deyu An

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[76] arXiv:2404.16296 [pdf, ps, other]: Title: Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

Authors: Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[77] arXiv:2404.16268 [pdf, other]: Title: Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis

Authors: Akshatha Mohan, Joshua Peeples

Comments: 9 pages, 7 figures, accepted at 2024 IEEE/CVF Computer Vision and Pattern Recognition Vision for Agriculture Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[78] arXiv:2404.16266 [pdf, other]: Title: A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation

Authors: Yifan Zhao, Zhenyu Liang, Zhichao Lu, Ran Cheng

Comments: 8 pages, 16 figures, GECCO 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[79] arXiv:2404.16223 [pdf, other]: Title: Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu, Liangyan Li, Ke Chen, Yunzhe Li, Yimo Ning, Guanhua Zhao, Jun Chen, Jinyang Yu, Kele Xu, Qisheng Xu, Yong Dou

Comments: CVPR 2024 - NTIRE Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[80] arXiv:2404.16222 [pdf, other]: Title: Step Differences in Instructional Video

Authors: Tushar Nagarajan, Lorenzo Torresani

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[81] arXiv:2404.16221 [pdf, other]: Title: NeRF-XL: Scaling NeRFs with Multiple GPUs

Authors: Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

Comments: Webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Graphics (cs.GR)
[82] arXiv:2404.16216 [pdf, other]: Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling

Authors: Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2404.16205 [pdf, other]: Title: AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng

Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[84] arXiv:2404.16193 [pdf, other]: Title: Improving Multi-label Recognition using Class Co-Occurrence Probabilities

Authors: Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[85] arXiv:2404.16155 [pdf, other]: Title: Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain

Authors: Kuan-I Chung, Daniel Moyer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (cs.LG)
[86] arXiv:2404.16139 [pdf, other]: Title: A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges

Authors: Melih Yazgan, Thomas Graf, Min Liu, J. Marius Zoellner

Comments: 8 pages, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[87] arXiv:2404.16136 [pdf, other]: Title: 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement

Authors: Filipa Lino, Carlos Santiago, Manuel Marques

Comments: Accepted at 6th Workshop and Competition on Affective Behavior Analysis in-the-wild - CVPR 2024 Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[88] arXiv:2404.16133 [pdf, ps, other]: Title: Quantitative Characterization of Retinal Features in Translated OCTA

Authors: Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam

Comments: The article has been revised and edited

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[89] arXiv:2404.16123 [pdf, other]: Title: FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Authors: Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle

Comments: Conference paper at CVPR 2024. 6 pages, 8 figures. Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[90] arXiv:2404.16038 [pdf, other]: Title: A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

Authors: Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

Comments: 16 pages, 10 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[91] arXiv:2404.16037 [pdf, other]: Title: VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

Authors: Yutong Xiong, Xun Zhu, Ming Wu, Weiqing Li, Fanbin Mo, Chuang Zhang, Bin Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[92] arXiv:2404.16823 (cross-list from cs.RO) [pdf, other]: Title: Learning Visuotactile Skills with Two Multifingered Hands

Authors: Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

Comments: Code and Project Website: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[93] arXiv:2404.16767 (cross-list from cs.LG) [pdf, other]: Title: REBEL: Reinforcement Learning via Regressing Relative Rewards

Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[94] arXiv:2404.16718 (cross-list from eess.IV) [pdf, other]: Title: Features Fusion for Dual-View Mammography Mass Detection

Authors: Arina Varlamova, Valery Belotsky, Grigory Novikov, Anton Konushin, Evgeny Sidorov

Comments: Accepted at ISBI 2024 (21st IEEE International Symposium on Biomedical Imaging)

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[95] arXiv:2404.16708 (cross-list from eess.IV) [pdf, other]: Title: Multi-view Cardiac Image Segmentation via Trans-Dimensional Priors

Authors: Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[96] arXiv:2404.16529 (cross-list from cs.RO) [pdf, other]: Title: Vision-based robot manipulation of transparent liquid containers in a laboratory setting

Authors: Daniel Schober, Ronja Güldenring, James Love, Lazaros Nalpantidis

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[97] arXiv:2404.16510 (cross-list from cs.GR) [pdf, other]: Title: Interactive3D: Create What You Want by Interactive 3D Generation

Authors: Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

Comments: project page: this https URL

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
[98] arXiv:2404.16482 (cross-list from q-bio.NC) [pdf, other]: Title: CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

Authors: Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[99] arXiv:2404.16397 (cross-list from eess.IV) [pdf, other]: Title: Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology

Authors: Tiago Gonçalves, Dagoberto Pulido-Arias, Julian Willett, Katharina V. Hoebel, Mason Cleveland, Syed Rakin Ahmed, Elizabeth Gerstner, Jayashree Kalpathy-Cramer, Jaime S. Cardoso, Christopher P. Bridge, Albert E. Kim

Comments: Paper accepted at the First Workshop on Imageomics (Imageomics-AAAI-24) - Discovering Biological Knowledge from Images using AI (this https URL), held as part of the 38th Annual AAAI Conference on Artificial Intelligence (this https URL)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[100] arXiv:2404.16346 (cross-list from eess.IV) [pdf, other]: Title: Light-weight Retinal Layer Segmentation with Global Reasoning

Authors: Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

Comments: IEEE Transactions on Instrumentation & Measurement

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[101] arXiv:2404.16336 (cross-list from cs.LG) [pdf, other]: Title: FedStyle: Style-Based Federated Learning Crowdsourcing Framework for Art Commissions

Authors: Changjuan Ran, Yeting Guo, Fang Liu, Shenglan Cui, Yunfan Ye

Comments: Accepted to ICME 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[102] arXiv:2404.16307 (cross-list from cs.LG) [pdf, other]: Title: Boosting Model Resilience via Implicit Adversarial Data Augmentation

Authors: Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang

Comments: 9 pages, 6 figures, accepted by IJCAI 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[103] arXiv:2404.16300 (cross-list from cs.LG) [pdf, other]: Title: Reinforcement Learning with Generative Models for Compact Support Sets

Authors: Nico Schiavone, Xingyu Li

Comments: 4 pages, 2 figures. Code available at: this https URL

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[104] arXiv:2404.16292 (cross-list from cs.GR) [pdf, other]: Title: One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns

Authors: Arman Maesumi, Dylan Hu, Krishi Saripalli, Vladimir G. Kim, Matthew Fisher, Sören Pirk, Daniel Ritchie

Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[105] arXiv:2404.16255 (cross-list from cs.CR) [pdf, other]: Title: Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption

Authors: Bharat Yalavarthi, Arjun Ramesh Kaushik, Arun Ross, Vishnu Boddeti, Nalini Ratha

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[106] arXiv:2404.16212 (cross-list from cs.CR) [pdf, other]: Title: An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

Authors: Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath

Comments: Accepted to IEEE S&P 2024; 19 pages, 10 figures

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[107] arXiv:2404.16192 (cross-list from cs.CL) [pdf, other]: Title: Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering

Authors: Cuong Nhat Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, Thomas Runkler

Comments: Clinical NLP @ NAACL 2024

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[108] arXiv:2404.16174 (cross-list from cs.HC) [pdf, other]: Title: MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models

Authors: Grace Guo, Lifu Deng, Animesh Tandon, Alex Endert, Bum Chul Kwon

Comments: 14 pages, 6 figures, ACM FAccT 2024

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[109] arXiv:2404.16112 (cross-list from cs.LG) [pdf, other]: Title: Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Authors: Badri Narayana Patro, Vijay Srinivas Agneeswaran

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[110] arXiv:2404.16080 (cross-list from eess.IV) [pdf, other]: Title: Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy

Authors: Hong-Jun Yoon, Chris Keum, Alexander Witkowski, Joanna Ludzik, Tracy Petrie, Heidi A. Hanson, Sancy A. Leachman

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[111] arXiv:2404.16049 (cross-list from physics.med-ph) [pdf, other]: Title: Exploring the limitations of blood pressure estimation using the photoplethysmography signal

Authors: Felipe M. Dias, Diego A.C. Cardenas, Marcelo A.F. Toledo, Filipe A.C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez

Comments: 17 pages, 7 figures, 3 tables

Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[112] arXiv:2404.15405 (cross-list from astro-ph.SR) [pdf, ps, other]: Title: Photometry of Saturated Stars with Machine Learning

Authors: Dominek Winecki (1) Christopher S. Kochanek (2) ((1) Dept. of Computer Science and Engineeering, The Ohio State University (2) Dept. of Astronomy, The Ohio State University)

Comments: submitted to ApJ

Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)

Thu, 25 Apr 2024

[113] arXiv:2404.16035 [pdf, other]: Title: MaGGIe: Masked Guided Gradual Human Instance Matting

Authors: Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee

Comments: CVPR 2024. Project link: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[114] arXiv:2404.16033 [pdf, other]: Title: Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

Comments: The project page is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[115] arXiv:2404.16030 [pdf, other]: Title: MoDE: CLIP Data Experts via Clustering

Authors: Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

Comments: IEEE CVPR 2024 Camera Ready. Code Link: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[116] arXiv:2404.16029 [pdf, other]: Title: Editable Image Elements for Controllable Synthesis

Authors: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[117] arXiv:2404.16022 [pdf, other]: Title: PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Authors: Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He

Comments: Tech Report. Codes and models will be available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[118] arXiv:2404.16017 [pdf, other]: Title: RetinaRegNet: A Versatile Approach for Retinal Image Registration

Authors: Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Isabella M . Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, Wei Shao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[119] arXiv:2404.16012 [pdf, other]: Title: GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

Authors: Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn, Seungryong Kim

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[120] arXiv:2404.16006 [pdf, other]: Title: MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

Comments: 77 pages, 41 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[121] arXiv:2404.16000 [pdf, other]: Title: A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta)

Authors: Stefano Woerner, Arthur Jaques, Christian F. Baumgartner

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[122] arXiv:2404.15992 [pdf, other]: Title: HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion

Authors: Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[123] arXiv:2404.15979 [pdf, other]: Title: On the Fourier analysis in the SO(3) space : EquiLoPO Network

Authors: Dmitrii Zhemchuzhnikov, Sergei Grudinin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Group Theory (math.GR)
[124] arXiv:2404.15956 [pdf, other]: Title: A Survey on Visual Mamba

Authors: Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[125] arXiv:2404.15955 [pdf, other]: Title: Beyond Deepfake Images: Detecting AI-Generated Videos

Authors: Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

Comments: To be published in CVPRW24

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[126] arXiv:2404.15946 [pdf, ps, other]: Title: Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

Authors: Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L.J. Qiu, Bin Zheng, Xiaofeng Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[127] arXiv:2404.15909 [pdf, other]: Title: Learning Long-form Video Prior via Generative Pre-Training

Authors: Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[128] arXiv:2404.15903 [pdf, other]: Title: Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors

Authors: Rafael Sterzinger, Simon Brenner, Robert Sablatnig

Comments: 19 pages, accepted at ICDAR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[129] arXiv:2404.15891 [pdf, other]: Title: OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

Authors: Lizhi Wang, Feng Zhou, Jianqin Yin

Comments: arXiv admin note: text overlap with arXiv:2311.17061 by other authors

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[130] arXiv:2404.15889 [pdf, other]: Title: Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control

Authors: Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[131] arXiv:2404.15882 [pdf, ps, other]: Title: Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Authors: Eunsu Baek, Keondo Park, Jiyoon Kim, Hyung-Sin Kim

Comments: Published as a conference paper at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[132] arXiv:2404.15881 [pdf, other]: Title: Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks

Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[133] arXiv:2404.15879 [pdf, other]: Title: Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection

Authors: Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer

Comments: Accepted for publication at the 2024 35th IEEE Intelligent Vehicles Symposium (IV 2024), June 2-5, 2024, in Jeju Island, Korea

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[134] arXiv:2404.15851 [pdf, ps, other]: Title: Porting Large Language Models to Mobile Devices for Question Answering

Authors: Hannes Fassold

Comments: Accepted for ASPAI 2024 Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[135] arXiv:2404.15817 [pdf, other]: Title: Vision Transformer-based Adversarial Domain Adaptation

Authors: Yahan Li, Yuan Wu

Comments: 6 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[136] arXiv:2404.15815 [pdf, other]: Title: Single-View Scene Point Cloud Human Grasp Generation

Authors: Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[137] arXiv:2404.15812 [pdf, other]: Title: Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance

Authors: David Montero, Miguel D. Mahecha, César Aybar, Clemens Mosig, Sebastian Wieneke

Comments: Submitted to FOSS4G Europe 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM)
[138] arXiv:2404.15802 [pdf, other]: Title: Raformer: Redundancy-Aware Transformer for Video Wire Inpainting

Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[139] arXiv:2404.15790 [pdf, other]: Title: Leveraging Large Language Models for Multimodal Search

Authors: Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua

Comments: Published at CVPRW 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[140] arXiv:2404.15789 [pdf, other]: Title: MotionMaster: Training-free Camera Motion Transfer For Video Generation

Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[141] arXiv:2404.15785 [pdf, other]: Title: Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

Authors: Jiaming Lei, Lin Li, Chunping Wang, Jun Xiao, Long Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[142] arXiv:2404.15781 [pdf, other]: Title: Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat

Authors: Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen

Comments: Accepted by TGRS 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[143] arXiv:2404.15774 [pdf, other]: Title: Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation

Authors: Vivek Anand, Bharat Lohani, Gaurav Pandey, Rakesh Mishra

Comments: 7 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[144] arXiv:2404.15771 [pdf, other]: Title: DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Authors: Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[145] arXiv:2404.15770 [pdf, other]: Title: ChEX: Interactive Localization and Region Description in Chest X-rays

Authors: Philip Müller, Georgios Kaissis, Daniel Rueckert

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[146] arXiv:2404.15765 [pdf, other]: Title: 3D Face Morphing Attack Generation using Non-Rigid Registration

Authors: Jag Mohan Singh, Raghavendra Ramachandra

Comments: Accepted to 2024 18th International Conference on Automatic Face and Gesture Recognition (FG) as short paper

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[147] arXiv:2404.15743 [pdf, other]: Title: SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation

Authors: Xiang Gao, Yuqi Zhang

Comments: 25 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[148] arXiv:2404.15736 [pdf, other]: Title: What Makes Multimodal In-Context Learning Work?

Authors: Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Comments: 20 pages, 16 figures. Accepted to CVPR 2024 Workshop on Prompting in Vision. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[149] arXiv:2404.15734 [pdf, other]: Title: Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction

Authors: Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[150] arXiv:2404.15721 [pdf, other]: Title: SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Authors: Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[151] arXiv:2404.15719 [pdf, other]: Title: HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition

Authors: Jinfu Liu, Baiqiao Yin, Jiaying Lin, Jiajun Wen, Yue Li, Mengyuan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[152] arXiv:2404.15714 [pdf, other]: Title: Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognition

Authors: Shu Liu, Yan Xu, Tongming Wan, Xiaoyan Kui

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[153] arXiv:2404.15709 [pdf, other]: Title: ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Authors: Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[154] arXiv:2404.15707 [pdf, other]: Title: ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images

Authors: Jinseo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[155] arXiv:2404.15700 [pdf, other]: Title: MAS-SAM: Segment Any Marine Animal with Aggregated Features

Authors: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

Comments: Accepted by IJCAI2024. More modifications may be performed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[156] arXiv:2404.15697 [pdf, other]: Title: DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images

Authors: Orazio Pontorno (1), Luca Guarnera (1), Sebastiano Battiato (1) ((1) University of Catania)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[157] arXiv:2404.15683 [pdf, other]: Title: AnoFPDM: Anomaly Segmentation with Forward Process of Diffusion Models for Brain MRI

Authors: Yiming Che, Fazle Rafsani, Jay Shah, Md Mahfuzur Rahman Siddiquee, Teresa Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[158] arXiv:2404.15677 [pdf, other]: Title: CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

Authors: Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia

Comments: Code will be released very soon: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[159] arXiv:2404.15672 [pdf, other]: Title: Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision

Authors: Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, Jianming Liang

Comments: Accepted at CVPR 2024 [main conference]

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[160] arXiv:2404.15655 [pdf, other]: Title: Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

Authors: Jiawei Yao, Qi Qian, Juhua Hu

Comments: Accepted by CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[161] arXiv:2404.15653 [pdf, other]: Title: CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Authors: Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[162] arXiv:2404.15644 [pdf, other]: Title: Building-PCC: Building Point Cloud Completion Benchmarks

Authors: Weixiao Gao, Ravi Peters, Jantien Stoter

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[163] arXiv:2404.15638 [pdf, other]: Title: PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazing

Authors: Yutong Chen, Zhang Wen, Chao Wang, Lei Gong, Zhongchao Yi

Comments: 8 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[164] arXiv:2404.15635 [pdf, other]: Title: A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time

Authors: Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[165] arXiv:2404.15608 [pdf, other]: Title: Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Authors: Kevin Hernandez-Diaz, Josef Bigun, Fernando Alonso-Fernandez

Comments: preprint manuscript

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[166] arXiv:2404.15592 [pdf, other]: Title: ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

Authors: Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[167] arXiv:2404.15591 [pdf, other]: Title: Domain Adaptation for Learned Image Compression with Supervised Adapters

Authors: Alberto Presta, Gabriele Spadaro, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Comments: 10 pages, published to Data compression conference 2024 (DCC2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[168] arXiv:2404.15580 [pdf, other]: Title: MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

Comments: submitted to journal

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[169] arXiv:2404.15564 [pdf, other]: Title: Guided AbsoluteGrad: Magnitude of Gradients Matters to Explanation's Localization and Saliency

Authors: Jun Huang, Yan Liu

Comments: CAI2024 Camera-ready Submission

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[170] arXiv:2404.15552 [pdf, other]: Title: Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches

Authors: Yi Li, Yunan Wu, Aggelos K. Katsaggelos

Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)
[171] arXiv:2404.15523 [pdf, other]: Title: Understanding Hyperbolic Metric Learning through Hard Negative Sampling

Authors: Yun Yue, Fangzhou Lin, Guanyi Mou, Ziming Zhang

Comments: published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024. arXiv admin note: text overlap with arXiv:2203.10833 by other authors

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[172] arXiv:2404.15516 [pdf, other]: Title: Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Authors: Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim

Comments: 15 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[173] arXiv:2404.15506 [pdf, other]: Title: Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

Comments: Our project page is at this https URL arXiv admin note: substantial text overlap with arXiv:2307.10984

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[174] arXiv:2404.15451 [pdf, other]: Title: CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Jingyu Wu, Yulun Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[175] arXiv:2404.15449 [pdf, other]: Title: ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Authors: Weifeng Chen, Jiacheng Zhang, Jie Wu, Hefeng Wu, Xuefeng Xiao, Liang Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[176] arXiv:2404.15447 [pdf, other]: Title: GLoD: Composing Global Contexts and Local Details in Image Generation

Authors: Moyuru Yamada

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[177] arXiv:2404.15445 [pdf, other]: Title: Deep multi-prototype capsule networks

Authors: Saeid Abbassi, Kamaledin Ghiasi-Shirazi, Ahad Harati

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[178] arXiv:2404.15436 [pdf, other]: Title: Iterative Cluster Harvesting for Wafer Map Defect Patterns

Authors: Alina Pleli, Simon Baeuerle, Michel Janus, Jonas Barth, Ralf Mikut, Hendrik P. A. Lensch

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[179] arXiv:2404.15406 [pdf, other]: Title: Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Authors: Davide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Comments: CVPR 2024 Workshop on What is Next in Multimodal Foundation Models

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[180] arXiv:2404.15385 [pdf, ps, other]: Title: Sum of Group Error Differences: A Critical Examination of Bias Evaluation in Biometric Verification and a Dual-Metric Measure

Authors: Alaa Elobaid, Nathan Ramoly, Lara Younes, Symeon Papadopoulos, Eirini Ntoutsi, Ioannis Kompatsiaris

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[181] arXiv:2404.15383 [pdf, other]: Title: WANDR: Intention-guided Human Motion Generation

Authors: Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[182] arXiv:2404.15378 [pdf, other]: Title: Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions

Authors: Khai Nguyen, Nhat Ho

Comments: 24 pages, 11 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[183] arXiv:2404.15919 (cross-list from cs.LG) [pdf, other]: Title: An Element-Wise Weights Aggregation Method for Federated Learning

Authors: Yi Hu, Hanchi Ren, Chen Hu, Jingjing Deng, Xianghua Xie

Comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[184] arXiv:2404.15918 (cross-list from eess.IV) [pdf, other]: Title: Perception and Localization of Macular Degeneration Applying Convolutional Neural Network, ResNet and Grad-CAM

Authors: Tahmim Hossain, Sagor Chandro Bakchy

Comments: 12 pages, 5 figures, 2 tables

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[185] arXiv:2404.15847 (cross-list from physics.med-ph) [pdf, other]: Title: 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking

Authors: Russell Buchanan, S. Jack Tu, Marco Camurri, Stephen J. Mellon, Maurice Fallon

Comments: Accepted to IEEE Medical Measurements & Applications (MeMeA) 2024

Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)
[186] arXiv:2404.15786 (cross-list from eess.IV) [pdf, other]: Title: Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

Authors: Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, Christian Ledig

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[187] arXiv:2404.15718 (cross-list from eess.IV) [pdf, other]: Title: Mitigating False Predictions In Unreasonable Body Regions

Authors: Constantin Ulrich, Catherine Knobloch, Julius C. Holzschuh, Tassilo Wald, Maximilian R. Rokuss, Maximilian Zenk, Maximilian Fischer, Michael Baumgartner, Fabian Isensee, Klaus H. Maier-Hein

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[188] arXiv:2404.15661 (cross-list from cs.GR) [pdf, other]: Title: CWF: Consolidating Weak Features in High-quality Mesh Simplification

Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu

Comments: 14 pages, 22 figures

Subjects: Graphics (cs.GR); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
[189] arXiv:2404.15532 (cross-list from cs.HC) [pdf, other]: Title: BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang

Comments: 26 pages, 14 figures The data and code for this project are accessible at this https URL

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[190] arXiv:2404.15394 (cross-list from eess.IV) [pdf, ps, other]: Title: On Generating Cancelable Biometric Template using Reverse of Boolean XOR

Authors: Manisha, Nitin Kumar

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[191] arXiv:2404.15367 (cross-list from eess.SP) [pdf, other]: Title: Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks

Authors: Rafael F. Oliveira, Gladston J. P. Moreira, Vander L. S. Freitas, Eduardo J. S. Luz

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[192] arXiv:2404.15364 (cross-list from eess.SP) [pdf, other]: Title: MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers

Authors: Yizhuo Wu, Ang Li, Mohammadreza Beikmirza, Gagan Deep Singh, Qinyu Chen, Leo C. N. de Vreede, Morteza Alavi, Chang Gao

Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[193] arXiv:2404.15346 (cross-list from eess.SP) [pdf, other]: Title: A Novel Micro-Doppler Coherence Loss for Deep Learning Radar Applications

Authors: Mikolaj Czerkawski, Christos Ilioudis, Carmine Clemente, Craig Michie, Ivan Andonovic, Christos Tachtatzis

Comments: Presented at 2021 18th European Radar Conference (EuRAD)

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[194] arXiv:2404.15318 (cross-list from q-bio.QM) [pdf, ps, other]: Title: VASARI-auto: equitable, efficient, and economical featurisation of glioma MRI

Authors: James K Ruffle, Samia Mohinta, Kelly Pegoretti Baruteau, Rebekah Rajiah, Faith Lee, Sebastian Brandner, Parashkev Nachev, Harpreet Hyare

Comments: 28 pages, 6 figures, 1 table

Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Tissues and Organs (q-bio.TO)
[195] arXiv:2404.15312 (cross-list from eess.SP) [pdf, other]: Title: Realtime Person Identification via Gait Analysis

Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV)
[196] arXiv:2404.15287 (cross-list from eess.IV) [pdf, other]: Title: A Semi-automatic Cranial Implant Design Tool Based on Rigid ICP Template Alignment and Voxel Space Reconstruction

Authors: Michael Lackner, Behrus Puladi, Jens Kleesiek, Jan Egger, Jianning Li

Comments: 6 pages

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[197] arXiv:2404.14956 (cross-list from eess.IV) [pdf, other]: Title: DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions

Authors: Ye Zhang, Yifeng Wang, Zijie Fang, Hao Bian, Linghan Cai, Ziyue Wang, Yongbing Zhang

Comments: 13 pages, 11 figures, 8 tables

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Wed, 24 Apr 2024 (showing first 27 of 110 entries)

[198] arXiv:2404.15276 [pdf, other]: Title: SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation

Authors: Xiangyu Xu, Lijuan Liu, Shuicheng Yan

Comments: Published at TPAMI 2024

Journal-ref: https://www.computer.org/csdl/journal/tp/2024/05/10354384/1SP2qWh8Fq0

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[199] arXiv:2404.15275 [pdf, other]: Title: ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Man Zhou, Jie Zhang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[200] arXiv:2404.15272 [pdf, other]: Title: CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

Comments: 12 pages, 5 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[201] arXiv:2404.15271 [pdf, other]: Title: Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

Authors: Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[202] arXiv:2404.15267 [pdf, other]: Title: From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Authors: Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[203] arXiv:2404.15264 [pdf, other]: Title: TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[204] arXiv:2404.15263 [pdf, other]: Title: Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

Authors: Lahav Lipson, Jia Deng

Comments: Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[205] arXiv:2404.15259 [pdf, other]: Title: FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

Authors: Cameron Smith, David Charatan, Ayush Tewari, Vincent Sitzmann

Comments: Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[206] arXiv:2404.15254 [pdf, other]: Title: UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

Comments: 17 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[207] arXiv:2404.15252 [pdf, other]: Title: Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions

Authors: Xingguang Zhang, Chih-Hsien Chou

Comments: accepted by the UG2+ workshop at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[208] arXiv:2404.15244 [pdf, other]: Title: Efficient Transformer Encoders for Mask2Former-style models

Authors: Manyi Yao, Abhishek Aich, Yumin Suh, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[209] arXiv:2404.15234 [pdf, other]: Title: Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition

Authors: Pedro C. Neto, Rafael M. Mamede, Carolina Albuquerque, Tiago Gonçalves, Ana F. Sequeira

Comments: Accepted at FG 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[210] arXiv:2404.15228 [pdf, other]: Title: Re-Thinking Inverse Graphics With Large Language Models

Authors: Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Abrevaya, Michael J. Black

Comments: 31 pages; project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[211] arXiv:2404.15224 [pdf, other]: Title: Deep Models for Multi-View 3D Object Recognition: A Review

Authors: Mona Alzahrani, Muhammad Usman, Salma Kammoun, Saeed Anwar, Tarek Helmy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[212] arXiv:2404.15217 [pdf, other]: Title: Towards Large-Scale Training of Pathology Foundation Models

Authors: kaiko.ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, Fei Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[213] arXiv:2404.15212 [pdf, other]: Title: Real-time Lane-wise Traffic Monitoring in Optimal ROIs

Authors: Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[214] arXiv:2404.15174 [pdf, other]: Title: Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion

Authors: Yu-Jie Liang, Zihan Cao, Liang-Jian Deng, Xiao Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[215] arXiv:2404.15163 [pdf, other]: Title: Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

Comments: IEEE Transactions on Broadcasting (TBC)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[216] arXiv:2404.15161 [pdf, other]: Title: Combating Missing Modalities in Egocentric Videos at Test Time

Authors: Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, Motasem Alfarra

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[217] arXiv:2404.15141 [pdf, other]: Title: CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[218] arXiv:2404.15129 [pdf, ps, other]: Title: Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN

Authors: Sara Dadjouy, Hedieh Sajedi

Comments: Published in 2024 10th International Conference on Artificial Intelligence and Robotics (QICAR)

Journal-ref: 2024 10th International Conference on Artificial Intelligence and Robotics (QICAR) (pp. 227-231). IEEE

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[219] arXiv:2404.15127 [pdf, other]: Title: MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Authors: Sunan He, Yuxiang Nie, Zhixuan Chen, Zhiyuan Cai, Hongmei Wang, Shu Yang, Hao Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[220] arXiv:2404.15100 [pdf, other]: Title: Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Authors: Xun Wu, Shaohan Huang, Furu Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[221] arXiv:2404.15081 [pdf, other]: Title: Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Authors: Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, Xiang Wei

Comments: Published at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[222] arXiv:2404.15041 [pdf, other]: Title: LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

Authors: Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[223] arXiv:2404.15037 [pdf, other]: Title: DP-Net: Learning Discriminative Parts for image recognition

Authors: Ronan Sicre, Hanwei Zhang, Julien Dejasmin, Chiheb Daaloul, Stéphane Ayache, Thierry Artières

Comments: IEEE ICIP 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[224] arXiv:2404.15033 [pdf, other]: Title: IPAD: Industrial Process Anomaly Detection Dataset

Authors: Jinfan Liu, Yichao Yan, Junjie Li, Weiming Zhao, Pengzhi Chu, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

[ total of 593 entries: 1-224 | 225-448 | 449-593 ]
[ showing 224 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.CV

Computer Vision and Pattern Recognition

Authors and titles for recent submissions

Fri, 26 Apr 2024

Thu, 25 Apr 2024

Wed, 24 Apr 2024 (showing first 27 of 110 entries)