Image and Video Processing
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Fri, 29 Mar 24
- [1] arXiv:2403.18873 [pdf, ps, other]
-
Title: Predicting risk of cardiovascular disease using retinal OCT imagingAuthors: Cynthia Maldonado-Garcia, Rodrigo Bonazzola, Enzo Ferrante, Thomas H Julian, Panagiotis I Sergouniotis, Nishant Ravikumara, Alejandro F FrangiComments: 18 pages for main manuscript, 7 figures, 2 pages for appendix and preprint for a journalSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We investigated the potential of optical coherence tomography (OCT) as an additional imaging technique to predict future cardiovascular disease (CVD). We utilised a self-supervised deep learning approach based on Variational Autoencoders (VAE) to learn low-dimensional representations of high-dimensional 3D OCT images and to capture distinct characteristics of different retinal layers within the OCT image. A Random Forest (RF) classifier was subsequently trained using the learned latent features and participant demographic and clinical data, to differentiate between patients at risk of CVD events (MI or stroke) and non-CVD cases. Our predictive model, trained on multimodal data, was assessed based on its ability to correctly identify individuals likely to suffer from a CVD event(MI or stroke), within a 5-year interval after image acquisition. Our self-supervised VAE feature selection and multimodal Random Forest classifier differentiate between patients at risk of future CVD events and the control group with an AUC of 0.75, outperforming the clinically established QRISK3 score (AUC= 0.597). The choroidal layer visible in OCT images was identified as an important predictor of future CVD events using a novel approach to model explanability. Retinal OCT imaging provides a cost-effective and non-invasive alternative to predict the risk of cardiovascular disease and is readily accessible in optometry practices and hospitals.
- [2] arXiv:2403.18992 [pdf, ps, other]
-
Title: Tractography with T1-weighted MRI and associated anatomical constraints on clinical quality diffusion MRIAuthors: Tian Yu, Yunhe Li, Michael E. Kim, Chenyu Gao, Qi Yang, Leon Y. Cai, Susane M. Resnick, Lori L. Beason-Held, Daniel C. Moyer, Kurt G. Schilling, Bennett A. LandmanSubjects: Image and Video Processing (eess.IV)
Diffusion MRI (dMRI) streamline tractography, the gold standard for in vivo estimation of brain white matter (WM) pathways, has long been considered indicative of macroscopic relationships with WM microstructure. However, recent advances in tractography demonstrated that convolutional recurrent neural networks (CoRNN) trained with a teacher-student framework have the ability to learn and propagate streamlines directly from T1 and anatomical contexts. Training for this network has previously relied on high-resolution dMRI. In this paper, we generalize the training mechanism to traditional clinical resolution data, which allows generalizability across sensitive and susceptible study populations. We train CoRNN on a small subset of the Baltimore Longitudinal Study of Aging (BLSA), which better resembles clinical protocols. Then, we define a metric, termed the epsilon ball seeding method, to compare T1 tractography and traditional diffusion tractography at the streamline level. Under this metric, T1 tractography generated by CoRNN reproduces diffusion tractography with approximately two millimeters of error.
- [3] arXiv:2403.19203 [pdf, other]
-
Title: Single-Shared Network with Prior-Inspired Loss for Parameter-Efficient Multi-Modal Imaging Skin Lesion ClassificationComments: This paper have submitted to Journal for reviewSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In this study, we introduce a multi-modal approach that efficiently integrates multi-scale clinical and dermoscopy features within a single network, thereby substantially reducing model parameters. The proposed method includes three novel fusion schemes.
Firstly, unlike current methods that usually employ two individual models for for clinical and dermoscopy modalities, we verified that multimodal feature can be learned by sharing the parameters of encoder while leaving the individual modal-specific classifiers.
Secondly, the shared cross-attention module can replace the individual one to efficiently interact between two modalities at multiple layers.
Thirdly, different from current methods that equally optimize dermoscopy and clinical branches, inspired by prior knowledge that dermoscopy images play a more significant role than clinical images, we propose a novel biased loss. This loss guides the single-shared network to prioritize dermoscopy information over clinical information, implicitly learning a better joint feature representation for the modal-specific task.
Extensive experiments on a well-recognized Seven-Point Checklist (SPC) dataset and a collected dataset demonstrate the effectiveness of our method on both CNN and Transformer structures. Furthermore, our method exhibits superiority in both accuracy and model parameters compared to currently advanced methods. - [4] arXiv:2403.19415 [pdf, ps, other]
-
Title: Brain-Shift: Unsupervised Pseudo-Healthy Brain Synthesis for Novel Biomarker Extraction in Chronic Subdural HematomaSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Chronic subdural hematoma (cSDH) is a common neurological condition characterized by the accumulation of blood between the brain and the dura mater. This accumulation of blood can exert pressure on the brain, potentially leading to fatal outcomes. Treatment options for cSDH are limited to invasive surgery or non-invasive management. Traditionally, the midline shift, hand-measured by experts from an ideal sagittal plane, and the hematoma volume have been the primary metrics for quantifying and analyzing cSDH. However, these approaches do not quantify the local 3D brain deformation caused by cSDH. We propose a novel method using anatomy-aware unsupervised diffeomorphic pseudo-healthy synthesis to generate brain deformation fields. The deformation fields derived from this process are utilized to extract biomarkers that quantify the shift in the brain due to cSDH. We use CT scans of 121 patients for training and validation of our method and find that our metrics allow the identification of patients who require surgery. Our results indicate that automatically obtained brain deformation fields might contain prognostic value for personalized cSDH treatment. Our implementation is available on: github.com/Barisimre/brain-morphing
- [5] arXiv:2403.19425 [pdf, ps, other]
-
Title: A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES ChallengeAuthors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel, Hyun-su Jeong, Chi-ho Yoon, Chulhong Kim, Jiayu Huo, Sebastien Ourselin, Rachel Sparks, Albert Clèrigues, Arnau Oliver, Xavier Lladó, Liam Chalcroft, Ioannis Pappas, Jeroen Bertels, Ewout Heylen, Juliette Moreau, Nima Hatami, Carole Frindel, Abdul Qayyum, Moona Mazher, Domenec Puig, Shao-Chieh Lin, Chun-Jung Juan, Tianxi Hu, Lyndon Boone, Maged Goubran, Yi-Jui Liu, Susanne Wegener, et al. (7 additional authors not shown)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemic stroke from various medical centers, facilitating the development of a wide range of cutting-edge segmentation algorithms by the research community. Through collaboration with leading teams, we combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions. Our ensemble model achieved superior ischemic lesion detection and segmentation accuracy on our internal test set compared to individual algorithms. This accuracy generalized well across diverse image and disease variables. Furthermore, the model excelled in extracting clinical biomarkers. Notably, in a Turing-like test, neuroradiologists consistently preferred the algorithm's segmentations over manual expert efforts, highlighting increased comprehensiveness and precision. Validation using a real-world external dataset (N=1686) confirmed the model's generalizability. The algorithm's outputs also demonstrated strong correlations with clinical scores (admission NIHSS and 90-day mRS) on par with or exceeding expert-derived results, underlining its clinical relevance. This study offers two key findings. First, we present an ensemble algorithm (https://github.com/Tabrisrei/ISLES22_Ensemble) that detects and segments ischemic stroke lesions on DWI across diverse scenarios on par with expert (neuro)radiologists. Second, we show the potential for biomedical challenge outputs to extend beyond the challenge's initial objectives, demonstrating their real-world clinical applicability.
- [6] arXiv:2403.19508 [pdf, other]
-
Title: Debiasing Cardiac Imaging with Controlled Latent Diffusion ModelsAuthors: Grzegorz Skorupko, Richard Osuala, Zuzanna Szafranowska, Kaisar Kushibar, Nay Aung, Steffen E Petersen, Karim Lekadir, Polyxeni GkontraSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry derived from segmentation masks using a large-cohort study, specifically, the UK Biobank. We assess our method by evaluating the realism of the generated images using established quantitative metrics. Furthermore, we conduct a downstream classification task aimed at debiasing a classifier by rectifying imbalances within underrepresented groups through synthetically generated samples. Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances, such as the scarcity of younger patients or individuals with normal BMI level suffering from heart failure. This work represents a major step towards the adoption of synthetic data for the development of fair and generalizable models for medical classification tasks. Notably, we conduct all our experiments using a single, consumer-level GPU to highlight the feasibility of our approach within resource-constrained environments. Our code is available at https://github.com/faildeny/debiasing-cardiac-mri.
Cross-lists for Fri, 29 Mar 24
- [7] arXiv:2403.18826 (cross-list from q-bio.QM) [pdf, ps, other]
-
Title: SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything ModelAuthors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu YuanComments: 23 pages, 6 figuresSubjects: Quantitative Methods (q-bio.QM); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM-dPCR, a novel self-supervised learning-based pipeline that enables real-time and high-throughput absolute quantification of biological samples. Leveraging the zero-shot SAM model, SAM-dPCR efficiently analyzes diverse microreactors with over 97.7% accuracy within a rapid processing time of 3.16 seconds. By utilizing commonly available lab fluorescence microscopes, SAM-dPCR facilitates the quantification of sample concentrations. The accuracy of SAM-dPCR is validated by the strong linear relationship observed between known and inferred sample concentrations. Additionally, SAM-dPCR demonstrates versatility through comprehensive verification using various samples and reactor morphologies. This accessible, cost-effective tool transcends the limitations of traditional detection methods or fully supervised AI models, marking the first application of SAM in nucleic acid detection or molecular diagnostics. By eliminating the need for annotated training data, SAM-dPCR holds great application potential for nucleic acid quantification in resource-limited settings.
- [8] arXiv:2403.18878 (cross-list from cs.CV) [pdf, other]
-
Title: AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
- [9] arXiv:2403.18908 (cross-list from cs.CV) [pdf, other]
-
Title: Enhancing Multiple Object Tracking Accuracy via Quantum AnnealingAuthors: Yasuyuki IharaComments: 19pages, 15 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Quantum Physics (quant-ph)
Multiple object tracking (MOT), a key task in image recognition, presents a persistent challenge in balancing processing speed and tracking accuracy. This study introduces a novel approach that leverages quantum annealing (QA) to expedite computation speed, while enhancing tracking accuracy through the ensembling of object tracking processes. A method to improve the matching integration process is also proposed. By utilizing the sequential nature of MOT, this study further augments the tracking method via reverse annealing (RA). Experimental validation confirms the maintenance of high accuracy with an annealing time of a mere 3 $\mu$s per tracking process. The proposed method holds significant potential for real-time MOT applications, including traffic flow measurement for urban traffic light control, collision prediction for autonomous robots and vehicles, and management of products mass-produced in factories.
- [10] arXiv:2403.19001 (cross-list from cs.CV) [pdf, other]
-
Title: Cross--domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score PredictionAuthors: Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'DonnellComments: 2 figures, 11 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
Shape plays an important role in computer graphics, offering informative features to convey an object's morphology and functionality. Shape analysis in brain imaging can help interpret structural and functionality correlations of the human brain. In this work, we investigate the shape of the brain's 3D white matter connections and its potential predictive relationship to human cognitive function. We reconstruct brain connections as sequences of 3D points using diffusion magnetic resonance imaging (dMRI) tractography. To describe each connection, we extract 12 shape descriptors in addition to traditional dMRI connectivity and tissue microstructure features. We introduce a novel framework, Shape--fused Fiber Cluster Transformer (SFFormer), that leverages a multi-head cross-attention feature fusion module to predict subject-specific language performance based on dMRI tractography. We assess the performance of the method on a large dataset including 1065 healthy young adults. The results demonstrate that both the transformer-based SFFormer model and its inter/intra feature fusion with shape, microstructure, and connectivity are informative, and together, they improve the prediction of subject-specific language performance scores. Overall, our results indicate that the shape of the brain's connections is predictive of human language function.
- [11] arXiv:2403.19083 (cross-list from cs.LG) [pdf, other]
-
Title: Improving Cancer Imaging Diagnosis with Bayesian Networks and Deep Learning: A Bayesian Deep Learning ApproachAuthors: Pei Xi (Alex)LinSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
With recent advancements in the development of artificial intelligence applications using theories and algorithms in machine learning, many accurate models can be created to train and predict on given datasets. With the realization of the importance of imaging interpretation in cancer diagnosis, this article aims to investigate the theory behind Deep Learning and Bayesian Network prediction models. Based on the advantages and drawbacks of each model, different approaches will be used to construct a Bayesian Deep Learning Model, combining the strengths while minimizing the weaknesses. Finally, the applications and accuracy of the resulting Bayesian Deep Learning approach in the health industry in classifying images will be analyzed.
- [12] arXiv:2403.19158 (cross-list from cs.CV) [pdf, other]
-
Title: Uncertainty-Aware Deep Video Compression with EnsemblesComments: Published on IEEE Transactions on MultimediaSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error. Although these two-stage models are end-to-end optimized, the epistemic uncertainty in the motion estimation and the aleatoric uncertainty from the quantization operation lead to errors in the intermediate representations and introduce artifacts in the reconstructed frames. This inherent flaw limits the potential for higher bit rate savings. To address this issue, we propose an uncertainty-aware video compression model that can effectively capture the predictive uncertainty with deep ensembles. Additionally, we introduce an ensemble-aware loss to encourage the diversity among ensemble members and investigate the benefits of incorporating adversarial training in the video compression task. Experimental results on 1080p sequences show that our model can effectively save bits by more than 20% compared to DVC Pro.
- [13] arXiv:2403.19238 (cross-list from cs.CV) [pdf, other]
-
Title: Taming Lookup Tables for Efficient Image RetouchingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.
- [14] arXiv:2403.19376 (cross-list from cs.CV) [pdf, other]
-
Title: NIGHT -- Non-Line-of-Sight Imaging from Indirect Time of Flight DataComments: Submitted to ECCV 24, 17 pages, 6 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
The acquisition of objects outside the Line-of-Sight of cameras is a very intriguing but also extremely challenging research topic. Recent works showed the feasibility of this idea exploiting transient imaging data produced by custom direct Time of Flight sensors. In this paper, for the first time, we tackle this problem using only data from an off-the-shelf indirect Time of Flight sensor without any further hardware requirement. We introduced a Deep Learning model able to re-frame the surfaces where light bounces happen as a virtual mirror. This modeling makes the task easier to handle and also facilitates the construction of annotated training data. From the obtained data it is possible to retrieve the depth information of the hidden scene. We also provide a first-in-its-kind synthetic dataset for the task and demonstrate the feasibility of the proposed idea over it.
Replacements for Fri, 29 Mar 24
- [15] arXiv:2204.11970 (replaced) [pdf, other]
-
Title: Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage SystemAuthors: Tobias Schlosser, Frederik Beuth, Trixy Meyer, Arunodhayan Sampath Kumar, Gabriel Stolze, Olga Furashova, Katrin Engelmann, Danny KowerkoComments: Preprint for journal Scientific Reports (Springer)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [16] arXiv:2306.16324 (replaced) [pdf, other]
-
Title: DoseDiff: Distance-aware Diffusion Model for Dose Prediction in RadiotherapySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [17] arXiv:2307.02496 (replaced) [pdf, ps, other]
-
Title: Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error DiffusionComments: Accepted for Oral presentation at WCIPT11 (11th World Congress on Industrial Process Tomography)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [18] arXiv:2310.08080 (replaced) [pdf, ps, other]
-
Title: RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray ProjectionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [19] arXiv:2402.19470 (replaced) [pdf, other]
-
Title: Towards Generalizable Tumor SynthesisComments: The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [20] arXiv:2403.11505 (replaced) [pdf, other]
-
Title: COVID-19 detection from pulmonary CT scans using a novel EfficientNet with attention mechanismAuthors: Ramy Farag, Parth Upadhyay, Yixiang Gao, Jacket Demby, Katherin Garces Montoya, Seyed Mohamad Ali Tousi, Gbenga Omotara, Guilherme DeSouzaSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [21] arXiv:2403.18339 (replaced) [pdf, other]
-
Title: H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT ImagesComments: 10 pages,4 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [22] arXiv:2204.08989 (replaced) [pdf, other]
-
Title: Efficient Deep Learning-based Estimation of the Vital Signs on SmartphonesComments: 10 pages, 8 figures, 11 tablesSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [23] arXiv:2305.19507 (replaced) [pdf, other]
-
Title: Manifold Constraint Regularization for Remote Sensing Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [24] arXiv:2401.03690 (replaced) [pdf, ps, other]
-
Title: So You Want to Image Myelin Using MRI: Magnetic Susceptibility Source Separation for Myelin ImagingComments: Accepted to Magnetic Resonance in Medical SciencesSubjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, recent, 2403, contact, help (Access key information)