We gratefully acknowledge support from
the Simons Foundation and member institutions.

Image and Video Processing

New submissions

[ total of 16 entries: 1-16 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 22 Jan 21

[1]  arXiv:2101.08309 [pdf, other]
Title: Chest X-ray lung and heart segmentation based on minimal training sets
Authors: Balázs Maga
Comments: Preprint. arXiv admin note: text overlap with arXiv:2003.10304
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

As the COVID-19 pandemic aggravated the excessive workload of doctors globally, the demand for computer aided methods in medical imaging analysis increased even further. Such tools can result in more robust diagnostic pipelines which are less prone to human errors. In our paper, we present a deep neural network to which we refer to as Attention BCDU-Net, and apply it to the task of lung and heart segmentation from chest X-ray (CXR) images, a basic but ardous step in the diagnostic pipeline, for instance for the detection of cardiomegaly. We show that the fine-tuned model exceeds previous state-of-the-art results, reaching $98.1\pm 0.1\%$ Dice score and $95.2\pm 0.1\%$ IoU score on the dataset of Japanese Society of Radiological Technology (JSRT). Besides that, we demonstrate the relative simplicity of the task by attaining surprisingly strong results with training sets of size 10 and 20: in terms of Dice score, $97.0\pm 0.8\%$ and $97.3\pm 0.5$, respectively, while in terms of IoU score, $92.2\pm 1.2\%$ and $93.3\pm 0.4\%$, respectively. To achieve these scores, we capitalize on the mixup augmentation technique, which yields a remarkable gain above $4\%$ IoU score in the size 10 setup.

[2]  arXiv:2101.08339 [pdf, other]
Title: Learning Ultrasound Rendering from Cross-Sectional Model Slices for Simulated Training
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Purpose. Given the high level of expertise required for navigation and interpretation of ultrasound images, computational simulations can facilitate the training of such skills in virtual reality. With ray-tracing based simulations, realistic ultrasound images can be generated. However, due to computational constraints for interactivity, image quality typically needs to be compromised.
Methods. We propose herein to bypass any rendering and simulation process at interactive time, by conducting such simulations during a non-time-critical offline stage and then learning image translation from cross-sectional model slices to such simulated frames. We use a generative adversarial framework with a dedicated generator architecture and input feeding scheme, which both substantially improve image quality without increase in network parameters. Integral attenuation maps derived from cross-sectional model slices, texture-friendly strided convolutions, providing stochastic noise and input maps to intermediate layers in order to preserve locality are all shown herein to greatly facilitate such translation task.
Results. Given several quality metrics, the proposed method with only tissue maps as input is shown to provide comparable or superior results to a state-of-the-art that uses additional images of low-quality ultrasound renderings. An extensive ablation study shows the need and benefits from the individual contributions utilized in this work, based on qualitative examples and quantitative ultrasound similarity metrics. To that end, a local histogram statistics based error metric is proposed and demonstrated for visualization of local dissimilarities between ultrasound images.

[3]  arXiv:2101.08502 [pdf]
Title: Weighted Fuzzy-Based PSNR for Watermarking
Comments: Five pages, 8 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

One of the problems of conventional visual quality evaluation criteria such as PSNR and MSE is the lack of appropriate standards based on the human visual system (HVS). They are calculated based on the difference of the corresponding pixels in the original and manipulated image. Hence, they practically do not provide a correct understanding of the image quality. Watermarking is an image processing application in which the image's visual quality is an essential criterion for its evaluation. Watermarking requires a criterion based on the HVS that provides more accurate values than conventional measures such as PSNR. This paper proposes a weighted fuzzy-based criterion that tries to find essential parts of an image based on the HVS. Then these parts will have larger weights in computing the final value of PSNR. We compare our results against standard PSNR, and our experiments show considerable consequences.

[4]  arXiv:2101.08525 [pdf, other]
Title: GhostSR: Learning Ghost Features for Efficient Image Super-Resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Modern single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs. The problem on feature redundancy is well studied in visual recognition task, but rarely discussed in SISR. Based on the observation that many features in SISR models are also similar to each other, we propose to use shift operation to generate the redundant features (i.e., Ghost features). Compared with depth-wise convolution which is not friendly to GPUs or NPUs, shift operation can bring practical inference acceleration for CNNs on common hardware. We analyze the benefits of shift operation for SISR and make the shift orientation learnable based on Gumbel-Softmax trick. For a given pre-trained model, we first cluster all filters in each convolutional layer to identify the intrinsic ones for generating intrinsic features. Ghost features will be derived by moving these intrinsic features along a specific orientation. The complete output features are constructed by concatenating the intrinsic and ghost features together. Extensive experiments on several benchmark models and datasets demonstrate that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines with large reduction of parameters, FLOPs and GPU latency. For instance, we reduce the parameters by 47%, FLOPs by 46% and GPU latency by 41% of EDSR x2 network without significant performance degradation.

[5]  arXiv:2101.08757 [pdf, other]
Title: Expectation-Maximization Regularized DeepLearning for Weakly Supervised Tumor Segmentation for Glioblastoma
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

We present an Expectation-Maximization (EM) Regularized Deep Learning (EMReDL) model for the weakly supervised tumor segmentation. The proposed framework was tailored to glioblastoma, a type of malignant tumor characterized by its diffuse infiltration into the surrounding brain tissue, which poses significant challenge to treatment target and tumor burden estimation based on conventional structural MRI. Although physiological MRI can provide more specific information regarding tumor infiltration, the relatively low resolution hinders a precise full annotation. This has motivated us to develop a weakly supervised deep learning solution that exploits the partial labelled tumor regions.
EMReDL contains two components: a physiological prior prediction model and EM-regularized segmentation model. The physiological prior prediction model exploits the physiological MRI by training a classifier to generate a physiological prior map. This map was passed to the segmentation model for regularization using the EM algorithm. We evaluated the model on a glioblastoma dataset with the available pre-operative multiparametric MRI and recurrence MRI. EMReDL was shown to effectively segment the infiltrated tumor from the partially labelled region of potential infiltration. The segmented core and infiltrated tumor showed high consistency with the tumor burden labelled by experts. The performance comparison showed that EMReDL achieved higher accuracy than published state-of-the-art models. On MR spectroscopy, the segmented region showed more aggressive features than other partial labelled region. The proposed model can be generalized to other segmentation tasks with partial labels, with the CNN architecture flexible in the framework.

Cross-lists for Fri, 22 Jan 21

[6]  arXiv:2101.08345 (cross-list from cs.CV) [pdf, other]
Title: Nonparametric clustering for image segmentation
Authors: Giovanna Menardi
Journal-ref: Statistical Analysis and Data Mining, 13(1), 83-97 (2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Applications (stat.AP)

Image segmentation aims at identifying regions of interest within an image, by grouping pixels according to their properties. This task resembles the statistical one of clustering, yet many standard clustering methods fail to meet the basic requirements of image segmentation: segment shapes are often biased toward predetermined shapes and their number is rarely determined automatically. Nonparametric clustering is, in principle, free from these limitations and turns out to be particularly suitable for the task of image segmentation. This is also witnessed by several operational analogies, as, for instance, the resort to topological data analysis and spatial tessellation in both the frameworks. We discuss the application of nonparametric clustering to image segmentation and provide an algorithm specific for this task. Pixel similarity is evaluated in terms of density of the color representation and the adjacency structure of the pixels is exploited to introduce a simple, yet effective method to identify image segments as disconnected high-density regions. The proposed method works both to segment an image and to detect its boundaries and can be seen as a generalization to color images of the class of thresholding methods.

[7]  arXiv:2101.08398 (cross-list from cs.CV) [pdf, other]
Title: TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection in Chest X-Ray Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data such as connected components and holes and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on a vast array of data applications, images in particular. To capture the characteristics of both powerful tools, we propose \textit{TDA-Net}, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed \textit{TDA-Net} to a critical application, which is the automated detection of COVID-19 from CXR images. The experimental results showed that the proposed network achieved excellent performance and suggests the applicability of our method in practice.

[8]  arXiv:2101.08427 (cross-list from cs.LG) [pdf, other]
Title: Analysis of Information Flow Through U-Nets
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Deep Neural Networks (DNNs) have become ubiquitous in medical image processing and analysis. Among them, U-Nets are very popular in various image segmentation tasks. Yet, little is known about how information flows through these networks and whether they are indeed properly designed for the tasks they are being proposed for. In this paper, we employ information-theoretic tools in order to gain insight into information flow through U-Nets. In particular, we show how mutual information between input/output and an intermediate layer can be a useful tool to understand information flow through various portions of a U-Net, assess its architectural efficiency, and even propose more efficient designs.

[9]  arXiv:2101.08661 (cross-list from cs.CV) [pdf, other]
Title: Regularization via deep generative models: an analysis point of view
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

This paper proposes a new way of regularizing an inverse problem in imaging (e.g., deblurring or inpainting) by means of a deep generative neural network. Compared to end-to-end models, such approaches seem particularly interesting since the same network can be used for many different problems and experimental conditions, as soon as the generative model is suited to the data. Previous works proposed to use a synthesis framework, where the estimation is performed on the latent vector, the solution being obtained afterwards via the decoder. Instead, we propose an analysis formulation where we directly optimize the image itself and penalize the latent vector. We illustrate the interest of such a formulation by running experiments of inpainting, deblurring and super-resolution. In many cases our technique achieves a clear improvement of the performance and seems to be more robust, in particular with respect to initialization.

[10]  arXiv:2101.08694 (cross-list from astro-ph.IM) [pdf, other]
Title: Data Processing for Short-Term Solar Irradiance Forecasting using Ground-Based Infrared Images
Comments: arXiv admin note: text overlap with arXiv:2011.12401
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Image and Video Processing (eess.IV)

The generation of energy in a power grid which uses Photovoltaic (PV) systems depends on the projection of shadows from moving clouds in the Troposphere. This investigation proposes an efficient method of data processing for the statistical quantification of cloud features using long-wave infrared (IR) images and Global Solar Irradiance (GSI) measurements. The IR images are obtained using a data acquisition system (DAQ) mounted on a solar tracker. We explain how to remove cyclostationary biases in GSI measurements. Seasonal trends are removed from the GSI time series, using the theoretical GSI to obtain the Clear-Sky Index (CSI) time series. We introduce an atmospheric model to remove from IR images both the effect of atmosphere scatter irradiance and the effect of the Sun's direct irradiance. Scattering is produced by water spots and dust particles on the germanium lens of the enclosure. We explain how to remove the scattering effect produced by the germanium lens attached to the DAQ enclosure window of the IR camera. An atmospheric condition model classifies the sky-conditions in four different categories: clear-sky, cumulus, stratus and nimbus. When an IR image is classified in the category of clear-sky, it is used to model the scattering effect of the germanium lens.

Replacements for Fri, 22 Jan 21

[11]  arXiv:2009.12597 (replaced) [pdf, other]
Title: Potential Features of ICU Admission in X-ray Images of COVID-19 Patients
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[12]  arXiv:2101.03134 (replaced) [pdf, other]
Title: Explainable Systematic Analysis for Synthetic Aperture Sonar Imagery
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
[13]  arXiv:2101.03244 (replaced) [pdf, other]
Title: End-to-end Prostate Cancer Detection in bpMRI via 3D CNNs: Effect of Attention Mechanisms, Clinical Priori and Decoupled False Positive Reduction
Comments: Under Review at MedIA: Medical Image Analysis. This manuscript incorporates and expands upon our 2020 Medical Imaging Meets NeurIPS Workshop paper (arXiv:2011.00263)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[14]  arXiv:2101.07866 (replaced) [pdf, other]
Title: Classification of COVID-19 X-ray Images Using a Combination of Deep and Handcrafted Features
Comments: 5 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[15]  arXiv:1911.03462 (replaced) [pdf, other]
Title: Knowledge Distillation for Incremental Learning in Semantic Segmentation
Comments: Computer Vision and Image Understanding (CVIU), 2021. arXiv admin note: text overlap with arXiv:1907.13372
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[16]  arXiv:2010.16322 (replaced) [pdf, other]
Title: DeepWay: a Deep Learning Waypoint Estimator for Global Path Generation
Comments: Submitted to Computers and Electronics in Agriculture
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[ total of 16 entries: 1-16 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2101, contact, help  (Access key information)