We gratefully acknowledge support from
the Simons Foundation and member institutions.

Image and Video Processing

New submissions

[ total of 13 entries: 1-13 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 20 Jan 20

[1]  arXiv:2001.06236 [pdf]
Title: Detection Method Based on Automatic Visual Shape Clustering for Pin-Missing Defect in Transmission Lines
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Bolts are the most numerous fasteners in transmission lines and are prone to losing their split pins. How to realize the automatic pin-missing defect detection for bolts in transmission lines so as to achieve timely and efficient trouble shooting is a difficult problem and the long-term research target of power systems. In this paper, an automatic detection model called Automatic Visual Shape Clustering Network (AVSCNet) for pin-missing defect is constructed. Firstly, an unsupervised clustering method for the visual shapes of bolts is proposed and applied to construct a defect detection model which can learn the difference of visual shape. Next, three deep convolutional neural network optimization methods are used in the model: the feature enhancement, feature fusion and region feature extraction. The defect detection results are obtained by applying the regression calculation and classification to the regional features. In this paper, the object detection model of different networks is used to test the dataset of pin-missing defect constructed by the aerial images of transmission lines from multiple locations, and it is evaluated by various indicators and is fully verified. The results show that our method can achieve considerably satisfactory detection effect.

[2]  arXiv:2001.06342 [pdf, other]
Title: DeepSUM++: Non-local Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images
Comments: arXiv admin note: text overlap with arXiv:1907.06490
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning methods for super-resolution of a remote sensing scene from multiple unregistered low-resolution images have recently gained attention thanks to a challenge proposed by the European Space Agency. This paper presents an evolution of the winner of the challenge, showing how incorporating non-local information in a convolutional neural network allows to exploit self-similar patterns that provide enhanced regularization of the super-resolution problem. Experiments on the dataset of the challenge show improved performance over the state-of-the-art, which does not exploit non-local information.

[3]  arXiv:2001.06434 [pdf, other]
Title: Sinogram super-resolution and denoising convolutional neural network (SRCN) for limited data photoacoustic tomography
Subjects: Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

The quality of the reconstructed photoacoustic image largely depends on the amount of photoacoustic (PA) boundary data available, which in turn is proportional to the number of detectors employed. In case of limited data (owing to less number of detectors due to cost/instrumentation constraints), the reconstructed PA images suffer from artifacts and are often noisy. In this work, for the first time, a deep learning based model was developed to super resolve and denoise the photoacoustic sinogram data. The proposed method was compared with existing nearest neighbor interpolation and wavelet based denoising techniques and was shown to outperform them both in numerical and in-vivo cases. The improvement obtained in Root Mean Square Error (RMSE) and Peak Signal to Noise Ratio (PSNR) for the reconstructed PA image using the sinogram data that was super-resolved and denoised using proposed neural network based method was as high as 41.70 % and 6.93 dB respectively compared to utilizing limited sinogram data.

Cross-lists for Mon, 20 Jan 20

[4]  arXiv:2001.06083 (cross-list from math.NA) [pdf, other]
Title: L1 data fitting for robust reconstruction in magnetic particle imaging: quantitative evaluation on Open MPI dataset
Subjects: Numerical Analysis (math.NA); Image and Video Processing (eess.IV)

Magnetic particle imaging is an emerging quantitative imaging modality, exploiting the unique nonlinear magnetization phenomenon of superparamagnetic iron oxide nanoparticles for recovering the concentration. Traditionally the reconstruction is formulated into a penalized least-squares problem with nonnegativity constraint, and then solved using a variant of Kaczmarz method. In order to achieve good performance, a preprocessing step of frequency selection to remove the deleterious influences of highly noisy measurements is often adopted. In this work, we propose a complementary approach to frequency selection, by viewing highly noisy measurements as outliers, and employing the l1 data fitting, one popular approach from robust statistics. When compared with the standard approach, it is easy to implement with a comparable computational complexity. Experiments with a public domain dataset, i.e., Open MPI dataset, show that it can give accurate reconstructions, and is less prone to noisy measurements, which is clearly illustrated by the quantitative (PSNR / SSIM) and qualitative comparisons with the standard approach.

[5]  arXiv:2001.06151 (cross-list from cs.CV) [pdf, other]
Title: Interpreting Galaxy Deblender GAN from the Discriminator's Perspective
Comments: 5 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Generative adversarial networks (GANs) are well known for their unsupervised learning capabilities. A recent success in the field of astronomy is deblending two overlapping galaxy images via a branched GAN model. However, it remains a significant challenge to comprehend how the network works, which is particularly difficult for non-expert users. This research focuses on behaviors of one of the network's major components, the Discriminator, which plays a vital role but is often overlooked, Specifically, we enhance the Layer-wise Relevance Propagation (LRP) scheme to generate a heatmap-based visualization. We call this technique Polarized-LRP and it consists of two parts i.e. positive contribution heatmaps for ground truth images and negative contribution heatmaps for generated images. Using the Galaxy Zoo dataset we demonstrate that our method clearly reveals attention areas of the Discriminator when differentiating generated galaxy images from ground truth images. To connect the Discriminator's impact on the Generator, we visualize the gradual changes of the Generator across the training process. An interesting result we have achieved there is the detection of a problematic data augmentation procedure that would else have remained hidden. We find that our proposed method serves as a useful visual analytical tool for a deeper understanding of GAN models.

[6]  arXiv:2001.06252 (cross-list from cs.CV) [pdf]
Title: Two-Phase Object-Based Deep Learning for Multi-temporal SAR Image Change Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Change detection is one of the fundamental applications of synthetic aperture radar (SAR) images. However, speckle noise presented in SAR images has a much negative effect on change detection. In this research, a novel two-phase object-based deep learning approach is proposed for multi-temporal SAR image change detection. Compared with traditional methods, the proposed approach brings two main innovations. One is to classify all pixels into three categories rather than two categories: unchanged pixels, changed pixels caused by strong speckle (false changes), and changed pixels formed by real terrain variation (real changes). The other is to group neighboring pixels into segmented into superpixel objects (from pixels) such as to exploit local spatial context. Two phases are designed in the methodology: 1) Generate objects based on the simple linear iterative clustering algorithm, and discriminate these objects into changed and unchanged classes using fuzzy c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the set of changed and unchanged superpixels. 2) Deep learning on the pixel sets over the changed superpixels only, obtained in the first phase, to discriminate real changes from false changes. SLIC is employed again to achieve new superpixels in the second phase. Low rank and sparse decomposition are applied to these new superpixels to suppress speckle noise significantly. A further clustering step is applied to these new superpixels via FCM. A new PCANet is then trained to classify two kinds of changed superpixels to achieve the final change maps. Numerical experiments demonstrate that, compared with benchmark methods, the proposed approach can distinguish real changes from false changes effectively with significantly reduced false alarm rates, and achieve up to 99.71% change detection accuracy using multi-temporal SAR imagery.

[7]  arXiv:2001.06265 (cross-list from cs.CV) [pdf, other]
Title: SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On
Comments: Accepted at IEEE WACV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Image-based virtual try-on for fashion has gained considerable attention recently. The task requires trying on a clothing item on a target model image. An efficient framework for this is composed of two stages: (1) warping (transforming) the try-on cloth to align with the pose and shape of the target model, and (2) a texture transfer module to seamlessly integrate the warped try-on cloth onto the target model image. Existing methods suffer from artifacts and distortions in their try-on output. In this work, we present SieveNet, a framework for robust image-based virtual try-on. Firstly, we introduce a multi-stage coarse-to-fine warping network to better model fine-grained intricacies (while transforming the try-on cloth) and train it with a novel perceptual geometric matching loss. Next, we introduce a try-on cloth conditioned segmentation mask prior to improve the texture transfer network. Finally, we also introduce a dueling triplet loss strategy for training the texture translation network which further improves the quality of the generated try-on results. We present extensive qualitative and quantitative evaluations of each component of the proposed pipeline and show significant performance improvements against the current state-of-the-art method.

[8]  arXiv:2001.06287 (cross-list from eess.SP) [pdf, other]
Title: Cellular-Connected Wireless Virtual Reality: Requirements, Challenges, and Solutions
Comments: 7 pages, 3 figures
Subjects: Signal Processing (eess.SP); Image and Video Processing (eess.IV)

Cellular-connected wireless connectivity provides new opportunities for virtual reality(VR) to offer seamless user experience from anywhere at anytime. To realize this vision, the quality-of-service (QoS) for wireless VR needs to be carefully defined to reflect human perception requirements. In this paper, we first identify the primary drivers of VR systems, in terms of applications and use cases. We then map the human perception requirements to corresponding QoS requirements for four phases of VR technology development. To shed light on how to provide short/long-range mobility for VR services, we further list four main use cases for cellular-connected wireless VR and identify their unique research challenges along with their corresponding enabling technologies and solutions in 5G systems and beyond. Last but not least, we present a case study to demonstrate the effectiveness of our proposed solution and the unique QoS performance requirements of VR transmission compared with that of traditional video service in cellular networks.

[9]  arXiv:2001.06440 (cross-list from cs.CV) [pdf, other]
Title: Combining PRNU and noiseprint for robust and efficient device source identification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

PRNU-based image processing is a key asset in digital multimedia forensics. It allows for reliable device identification and effective detection and localization of image forgeries, in very general conditions. However, performance impairs significantly in challenging conditions involving low quality and quantity of data. These include working on compressed and cropped images, or estimating the camera PRNU pattern based on only a few images. To boost the performance of PRNU-based analyses in such conditions we propose to leverage the image noiseprint, a recently proposed camera-model fingerprint that has proved effective for several forensic tasks. Numerical experiments on datasets widely used for source identification prove that the proposed method ensures a significant performance improvement in a wide range of challenging situations.

[10]  arXiv:2001.06466 (cross-list from cs.MM) [pdf, other]
Title: Low Latency Volumetric Video Edge Cloud Streaming
Comments: 13 pages, 8 figures
Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)

Volumetric video is an emerging key technology for immersive representation of 3D spaces and objects. The enhanced immersion of volumetric videos leads to new use cases such as streaming of six-degrees-of-freedom (6DoF) videos in which the user can freely change his position and orientation. However, rendering volumetric videos as 3D representations (mesh or point cloud) requires lots of computational power and transmission of such volumetric data requires lots of bandwidth. To mitigate this issue, rendering a 2D view from the volumetric data at a cloud/edge server and streaming that as a 2D video is a feasible solution. However, network-based processing brings additional network and processing latency. In order to reduce the motion-to-photon latency, prediction of the future user pose is necessary. We developed a 6DoF user movement prediction model for very low latency streaming services and investigated its potential to further reduce the motion-to-photon latency for different prediction windows. Our results show that the developed prediction model on average reduces the positional rendering errors caused by the motion-to-photon latency compared to a baseline system in which no prediction is performed.

Replacements for Mon, 20 Jan 20

[11]  arXiv:1806.01340 (replaced) [pdf]
Title: Design of optimal illumination patterns in single-pixel imaging using image dictionaries
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[12]  arXiv:1904.09257 (replaced) [pdf]
Title: Image Denosing In Underwater Acoustic Noise Using Discrete Wavelet Transform With Different Noise Level Estimation
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[13]  arXiv:1911.00353 (replaced) [pdf]
Title: Does deep learning always outperform simple linear regression in optical imaging?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[ total of 13 entries: 1-13 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2001, contact, help  (Access key information)