We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Vision and Pattern Recognition

New submissions

[ total of 57 entries: 1-57 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 23 Jan 20

[1]  arXiv:2001.07710 [pdf, other]
Title: An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
Comments: arXiv admin note: text overlap with arXiv:1909.05073
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Image and Video Processing (eess.IV)

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are essentially trade-offs between model accuracy and regularity which lead to impaired inference accuracy and limited on-device acceleration performance. To solve the problem, we introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. With carefully designed patterns, the proposed pruning unprecedentedly and consistently achieves accuracy enhancement and better feature extraction ability on different DNN structures and datasets, and our pattern-aware pruning framework also achieves pattern library extraction, pattern selection, pattern and connectivity pruning and weight training simultaneously. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms. To the best of our knowledge, it is the first time that mobile devices achieve real-time inference for the large-scale DNN models thanks to the unique spatial property of pattern-based sparsity and the help of the code generation capability of compilers.

[2]  arXiv:2001.07739 [pdf, ps, other]
Title: EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours. The objective of the challenge is to promote research in the development of assistive technologies that help improve the quality of life for people with chronic pain via real-time monitoring and feedback to help manage their condition and remain physically active. The challenge also aims to encourage the use of the relatively underutilised, albeit vital bodily expression signals for automatic pain and pain-related emotion recognition. This paper presents a description of the challenge, competition guidelines, bench-marking dataset, and the baseline systems' architecture and performance on the three sub-tasks: pain estimation from facial expressions, pain recognition from multimodal movement, and protective movement behaviour detection.

[3]  arXiv:2001.07761 [pdf, other]
Title: Block-wise Scrambled Image Recognition Using Adaptation Network
Comments: 6 pages Artificial Intelligence of Things(AAAI-2020 WS)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is proposed to recognize those scrambled images. Experimental comparisons conducted using CIFAR datasets demonstrated that the proposed adaptation network performed well in incorporating simple perceptual information hiding into DNN-based image classification.

[4]  arXiv:2001.07766 [pdf, other]
Title: Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Single Image Super-Resolution (SISR) task refers to learn a mapping from low-resolution images to the corresponding high-resolution ones. This task is known to be extremely difficult since it is an ill-posed problem. Recently, Convolutional Neural Networks (CNNs) have achieved state of the art performance on SISR. However, the images produced by CNNs do not contain fine details of the images. Generative Adversarial Networks (GANs) aim to solve this issue and recover sharp details. Nevertheless, GANs are notoriously difficult to train. Besides that, they generate artifacts in the high-resolution images. In this paper, we have proposed a method in which CNNs try to align images in different spaces rather than only the pixel space. Such a space is designed using convex optimization techniques. CNNs are encouraged to learn high-frequency components of the images as well as low-frequency components. We have shown that the proposed method can recover fine details of the images and it is stable in the training process.

[5]  arXiv:2001.07776 [pdf, other]
Title: Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Acquiring large-scale medical image data, necessary for training machine learning algorithms, is frequently intractable, due to prohibitive expert-driven annotation costs. Recent datasets extracted from hospital archives, e.g., DeepLesion, have begun to address this problem. However, these are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of its lesions unlabeled. Thus, effective methods to harvest missing annotations are critical for continued progress in medical image analysis. This is the goal of our work, where we develop a powerful system to harvest missing lesions from the DeepLesion dataset at high precision. Accepting the need for some degree of expert labor to achieve high fidelity, we exploit a small fully-labeled subset of medical image volumes and use it to intelligently mine annotations from the remainder. To do this, we chain together a highly sensitive lesion proposal generator and a very selective lesion proposal classifier. While our framework is generic, we optimize our performance by proposing a 3D contextual lesion proposal generator and by using a multi-view multi-scale lesion proposal classifier. These produce harvested and hard-negative proposals, which we then re-use to finetune our proposal generator by using a novel hard negative suppression loss, continuing this process until no extra lesions are found. Extensive experimental analysis demonstrates that our method can harvest an additional 9,805 lesions while keeping precision above 90%. To demonstrate the benefits of our approach, we show that lesion detectors trained on our harvested lesions can significantly outperform the same variants only trained on the original annotations, with boost of average precision of 7% to 10%. We open source our code and annotations at https://github.com/JimmyCai91/DeepLesionAnnotation.

[6]  arXiv:2001.07791 [pdf, other]
Title: Deep Depth Prior for Multi-View Stereo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It was recently shown that the structure of convolutional neural networks induces a strong prior favoring natural color images, a phenomena referred to as a deep image prior (DIP), which can be an effective regularizer in inverse problems such as image denoising, inpainting etc. In this paper, we investigate a similar idea for depth images, which we call a deep depth prior. Specifically, given a color image and a noisy and incomplete target depth map from the same viewpoint, we optimize a randomly initialized CNN model to reconstruct an RGB-D image where the depth channel gets restored by virtue of using the network structure as a prior. We propose using deep depth priors for refining and inpainting noisy depth maps within a multi-view stereo pipeline. We optimize the network parameters to minimize two losses 1) a RGB-D reconstruction loss based on the noisy depth map and 2) a multi-view photoconsistency-based loss, which is computed using images from a geometrically calibrated camera from nearby viewpoints. Our quantitative and qualitative evaluation shows that our refined depth maps are more accurate and complete, and after fusion, produces dense 3D models of higher quality.

[7]  arXiv:2001.07793 [pdf, other]
Title: Weakly Supervised Temporal Action Localization Using Deep Metric Learning
Comments: accepted to WACV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Temporal action localization is an important step towards video understanding. Most current action localization methods depend on untrimmed videos with full temporal annotations of action instances. However, it is expensive and time-consuming to annotate both action labels and temporal boundaries of videos. To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training. We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances. We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm. Extensive experiments demonstrate the effectiveness of both of these components in temporal localization. We evaluate our algorithm on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.

[8]  arXiv:2001.07799 [pdf]
Title: Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Scientific image tampering is a problem that affects not only authors but also the general perception of the research community. Although previous researchers have developed methods to identify tampering in natural images, these methods may not thrive under the scientific setting as scientific images have different statistics, format, quality, and intentions. Therefore, we propose a scientific-image specific tampering detection method based on noise inconsistencies, which is capable of learning and generalizing to different fields of science. We train and test our method on a new dataset of manipulated western blot and microscopy imagery, which aims at emulating problematic images in science. The test results show that our method can detect various types of image manipulation in different scenarios robustly, and it outperforms existing general-purpose image tampering detection schemes. We discuss applications beyond these two types of images and suggest next steps for making detection of problematic images a systematic step in peer review and science in general.

[9]  arXiv:2001.07809 [pdf]
Title: Depth-Based Selective Blurring in Stereo Images Using Accelerated Framework
Comments: arXiv admin note: text overlap with arXiv:2001.06967
Journal-ref: 3D Research (Springer) 5, Article number: 14 (2014)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

We propose a hybrid method for stereo disparity estimation by combining block and region-based stereo matching approaches. It generates dense depth maps from disparity measurements of only 18 % image pixels (left or right). The methodology involves segmenting pixel lightness values using fast K-Means implementation, refining segment boundaries using morphological filtering and connected components analysis; then determining boundaries' disparities using sum of absolute differences (SAD) cost function. Complete disparity maps are reconstructed from boundaries' disparities. We consider an application of our method for depth-based selective blurring of non-interest regions of stereo images, using Gaussian blur to de-focus users' non-interest regions. Experiments on Middlebury dataset demonstrate that our method outperforms traditional disparity estimation approaches using SAD and normalized cross correlation by up to 33.6 % and some recent methods by up to 6.1 %. Further, our method is highly parallelizable using CPU and GPU framework based on Java Thread Pool and APARAPI with speed-up of 5.8 for 250 stereo video frames (4,096 x 2,304).

[10]  arXiv:2001.07832 [pdf, other]
Title: LRF-Net: Learning Local Reference Frames for 3D Local Shape Description and Matching
Comments: 7 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The local reference frame (LRF) acts as a critical role in 3D local shape description and matching. However, most of existing LRFs are hand-crafted and suffer from limited repeatability and robustness. This paper presents the first attempt to learn an LRF via a Siamese network that needs weak supervision only. In particular, we argue that each neighboring point in the local surface gives a unique contribution to LRF construction and measure such contributions via learned weights. Extensive analysis and comparative experiments on three public datasets addressing different application scenarios have demonstrated that LRF-Net is more repeatable and robust than several state-of-the-art LRF methods (LRF-Net is only trained on one dataset). In addition, LRF-Net can significantly boost the local shape description and 6-DoF pose estimation performance when matching 3D point clouds.

[11]  arXiv:2001.07871 [pdf]
Title: M^2 Deep-ID: A Novel Model for Multi-View Face Identification Using Convolutional Deep Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite significant advances in Deep Face Recognition (DFR) systems, introducing new DFRs under specific constraints such as varying pose still remains a big challenge. Most particularly, due to the 3D nature of a human head, facial appearance of the same subject introduces a high intra-class variability when projected to the camera image plane. In this paper, we propose a new multi-view Deep Face Recognition (MVDFR) system to address the mentioned challenge. In this context, multiple 2D images of each subject under different views are fed into the proposed deep neural network with a unique design to re-express the facial features in a single and more compact face descriptor, which in turn, produces a more informative and abstract way for face identification using convolutional neural networks. To extend the functionality of our proposed system to multi-view facial images, the golden standard Deep-ID model is modified in our proposed model. The experimental results indicate that our proposed method yields a 99.8% accuracy, while the state-of-the-art method achieves a 97% accuracy. We also gathered the Iran University of Science and Technology (IUST) face database with 6552 images of 504 subjects to accomplish our experiments.

[12]  arXiv:2001.07884 [pdf, other]
Title: Curvature Regularized Surface Reconstruction from Point Cloud
Comments: 22 pages, 15 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a variational functional and fast algorithms to reconstruct implicit surface from point cloud data with a curvature constraint. The minimizing functional balances the distance function from the point cloud and the mean curvature term. Only the point location is used, without any local normal or curvature estimation at each point. With the added curvature constraint, the computation becomes particularly challenging. To enhance the computational efficiency, we solve the problem by a novel operator splitting scheme. It replaces the original high-order PDEs by a decoupled PDE system, which is solved by a semi-implicit method. We also discuss approach using an augmented Lagrangian method. The proposed method shows robustness against noise, and recovers concave features and sharp corners better compared to models without curvature constraint. Numerical experiments in two and three dimensional data sets, noisy and sparse data are presented to validate the model.

[13]  arXiv:2001.07895 [pdf, other]
Title: Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel approach for unsupervised domain adaptation (UDA) with target shift. Target shift is a problem of mismatch in label distribution between source and target domains. Typically it appears as class-imbalance in target domain. In practice, this is an important problem in UDA; as we do not know labels in target domain datasets, we do not know whether or not its distribution is identical to that in the source domain dataset. Many traditional approaches achieve UDA with distribution matching by minimizing mean maximum discrepancy or adversarial training; however these approaches implicitly assume a coincidence in the distributions and do not work under situations with target shift. Some recent UDA approaches focus on class boundary and some of them are robust to target shift, but they are only applicable to classification and not to regression.
To overcome the target shift problem in UDA, the proposed method, partially shared variational autoencoders (PS-VAEs), uses pair-wise feature alignment instead of feature distribution matching. PS-VAEs inter-convert domain of each sample by a CycleGAN-based architecture while preserving its label-related content. To evaluate the performance of PS-VAEs, we carried out two experiments: UDA with class-unbalanced digits datasets (classification), and UDA from synthesized data to real observation in human-pose-estimation (regression). The proposed method presented its robustness against the class-imbalance in the classification task, and outperformed the other methods in the regression task with a large margin.

[14]  arXiv:2001.07904 [pdf, other]
Title: Dynamic multi-object Gaussian process models: A framework for data-driven functional modelling of human joints
Comments: 15 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Statistical shape models (SSMs) are state-of-the-art medical image analysis tools for extracting and explaining features across a set of biological structures. However, a principled and robust way to combine shape and pose features has been illusive due to three main issues: 1) Non-homogeneity of the data (data with linear and non-linear natural variation across features), 2) non-optimal representation of the $3D$ motion (rigid transformation representations that are not proportional to the kinetic energy that move an object from one position to the other), and 3) artificial discretization of the models. In this paper, we propose a new framework for dynamic multi-object statistical modelling framework for the analysis of human joints in a continuous domain. Specifically, we propose to normalise shape and dynamic spatial features in the same linearized statistical space permitting the use of linear statistics; we adopt an optimal 3D motion representation for more accurate rigid transformation comparisons; and we provide a 3D shape and pose prediction protocol using a Markov chain Monte Carlo sampling-based fitting. The framework affords an efficient generative dynamic multi-object modelling platform for biological joints. We validate the framework using a controlled synthetic data. Finally, the framework is applied to an analysis of the human shoulder joint to compare its performance with standard SSM approaches in prediction of shape while adding the advantage of determining relative pose between bones in a complex. Excellent validity is observed and the shoulder joint shape-pose prediction results suggest that the novel framework may have utility for a range of medical image analysis applications. Furthermore, the framework is generic and can be extended to n$>$2 objects, making it suitable for clinical and diagnostic methods for the management of joint disorders.

[15]  arXiv:2001.07926 [pdf, other]
Title: Optimized Generic Feature Learning for Few-shot Classification across Domains
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To learn models or features that generalize across tasks and domains is one of the grand goals of machine learning. In this paper, we propose to use cross-domain, cross-task data as validation objective for hyper-parameter optimization (HPO) to improve on this goal. Given a rich enough search space, optimization of hyper-parameters learn features that maximize validation performance and, due to the objective, generalize across tasks and domains. We demonstrate the effectiveness of this strategy on few-shot image classification within and across domains. The learned features outperform all previous few-shot and meta-learning approaches.

[16]  arXiv:2001.07960 [pdf, other]
Title: A Fixation-based 360° Benchmark Dataset for Salient Object Detection
Comments: 5 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. However, another issue within the field of visual saliency, salient object detection (SOD), has been seldom explored in 360{\deg} (or omnidirectional) images due to the lack of datasets representative of real scenes with pixel-level annotations. Toward this end, we collect 107 equirectangular panoramas with challenging scenes and multiple object classes. Based on the consistency between FP and explicit saliency judgements, we further manually annotate 1,165 salient objects over the collected images with precise masks under the guidance of real human eye fixation maps. Six state-of-the-art SOD models are then benchmarked on the proposed fixation-based 360{\deg} image dataset (F-360iSOD), by applying a multiple cubic projection-based fine-tuning method. Experimental results show a limitation of the current methods when used for SOD in panoramic images, which indicates the proposed dataset is challenging. Key issues for 360{\deg} SOD is also discussed. The proposed dataset is available at https://github.com/Panorama-Bill/F-360iSOD.

[17]  arXiv:2001.07966 [pdf, other]
Title: ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them. The model is pre-trained on four tasks simultaneously: Masked Language Modeling (MLM), Masked Object Classification (MOC), Masked Region Feature Regression (MRFR), and Image Text Matching (ITM). To further enhance the pre-training quality, we have collected a Large-scale weAk-supervised Image-Text (LAIT) dataset from Web. We first pre-train the model on this dataset, then conduct a second stage pre-training on Conceptual Captions and SBU Captions. Our experiments show that multi-stage pre-training strategy outperforms single-stage pre-training. We also fine-tune and evaluate our pre-trained ImageBERT model on image retrieval and text retrieval tasks, and achieve new state-of-the-art results on both MSCOCO and Flickr30k datasets.

[18]  arXiv:2001.08026 [pdf, other]
Title: ResDepth: Learned Residual Stereo Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an embarrassingly simple, but very effective scheme for high-quality dense stereo reconstruction: (i) generate an approximate reconstruction with your favourite stereo matcher; (ii) rewarp the input images with that approximate model; and (iii) with the initial reconstruction and the warped images as input, train a deep network to enhance the reconstruction by regressing a residual correction. The strategy to only learn the residual greatly simplifies the learning problem. A standard Unet without bells and whistles is enough to reconstruct even small surface details, like dormers and roof substructures in satellite images. We also investigate residual reconstruction with less information and find that even a single image is enough to greatly improve an approximate reconstruction. Our full model reduces the mean absolute error of state-of-the-art stereo reconstruction systems by >50%, both in our target domain of satellite stereo and on stereo pairs from the ETH3D benchmark.

[19]  arXiv:2001.08047 [pdf, other]
Title: Attention! A Lightweight 2D Hand Pose Estimation Approach
Comments: submitted to IEEE Signal Processing Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Vision based human pose estimation is an non-invasive technology for Human-Computer Interaction (HCI). Direct use of the hand as an input device provides an attractive interaction method, with no need for specialized sensing equipment, such as exoskeletons, gloves etc, but a camera. Traditionally, HCI is employed in various applications spreading in areas including manufacturing, surgery, entertainment industry and architecture, to mention a few. Deployment of vision based human pose estimation algorithms can give a breath of innovation to these applications. In this letter, we present a novel Convolutional Neural Network architecture, reinforced with a Self-Attention module that it can be deployed on an embedded system, due to its lightweight nature, with just 1.9 Million parameters. The source code and qualitative results are publicly available.

[20]  arXiv:2001.08057 [pdf, other]
Title: Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread
Comments: Accepted as a regular paper in the IEEE Transactions on Cybernetics
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently deep convolutional neural networks have achieved significant success in salient object detection. However, existing state-of-the-art methods require high-end GPUs to achieve real-time performance, which makes them hard to adapt to low-cost or portable devices. Although generic network architectures have been proposed to speed up inference on mobile devices, they are tailored to the task of image classification or semantic segmentation, and struggle to capture intra-channel and inter-channel correlations that are essential for contrast modeling in salient object detection. Motivated by the above observations, we design a new deep learning algorithm for fast salient object detection. The proposed algorithm for the first time achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread. Specifically, we propose a novel depthwise non-local moudule (DNL), which implicitly models contrast via harvesting intra-channel and inter-channel correlations in a self-attention manner. In addition, we introduce a depthwise non-local network architecture that incorporates both depthwise non-local modules and inverted residual blocks. Experimental results show that our proposed network attains very competitive accuracy on a wide range of salient object detection datasets while achieving state-of-the-art efficiency among all existing deep learning based algorithms.

[21]  arXiv:2001.08095 [pdf, other]
Title: UniPose: Unified Human Pose Estimation in Single Images and Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. Current pose estimation methods utilizing standard CNN architectures heavily rely on statistical postprocessing or predefined anchor poses for joint localization. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method is extended to UniPose-LSTM for multi-frame processing and achieves state-of-the-art results for temporal pose estimation in Video. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.

[22]  arXiv:2001.08098 [pdf, other]
Title: Learning to Correct 3D Reconstructions from Multiple Views
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This paper is about reducing the cost of building good large-scale 3D reconstructions post-hoc. We render 2D views of an existing reconstruction and train a convolutional neural network (CNN) that refines inverse-depth to match a higher-quality reconstruction. Since the views that we correct are rendered from the same reconstruction, they share the same geometry, so overlapping views complement each other. We take advantage of that in two ways. Firstly, we impose a loss during training which guides predictions on neighbouring views to have the same geometry and has been shown to improve performance. Secondly, in contrast to previous work, which corrects each view independently, we also make predictions on sets of neighbouring views jointly. This is achieved by warping feature maps between views and thus bypassing memory-intensive 3D computation. We make the observation that features in the feature maps are viewpoint-dependent, and propose a method for transforming features with dynamic filters generated by a multi-layer perceptron from the relative poses between views. In our experiments we show that this last step is necessary for successfully fusing feature maps between views.

[23]  arXiv:2001.08111 [pdf, other]
Title: Are Accelerometers for Activity Recognition a Dead-end?
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accelerometer-based (and by extension other inertial sensors) research for Human Activity Recognition (HAR) is a dead-end. This sensor does not offer enough information for us to progress in the core domain of HAR---to recognize everyday activities from sensor data. Despite continued and prolonged efforts in improving feature engineering and machine learning models, the activities that we can recognize reliably have only expanded slightly and many of the same flaws of early models are still present today. Instead of relying on acceleration data, we should instead consider modalities with much richer information---a logical choice are images. With the rapid advance in image sensing hardware and modelling techniques, we believe that a widespread adoption of image sensors will open many opportunities for accurate and robust inference across a wide spectrum of human activities.
In this paper, we make the case for imagers in place of accelerometers as the default sensor for human activity recognition. Our review of past works has led to the observation that progress in HAR had stalled, caused by our reliance on accelerometers. We further argue for the suitability of images for activity recognition by illustrating their richness of information and the marked progress in computer vision. Through a feasibility analysis, we find that deploying imagers and CNNs on device poses no substantial burden on modern mobile hardware. Overall, our work highlights the need to move away from accelerometers and calls for further exploration of using imagers for activity recognition.

[24]  arXiv:2001.08173 [pdf]
Title: Causality based Feature Fusion for Brain Neuro-Developmental Analysis
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)

Human brain development is a complex and dynamic process that is affected by several factors such as genetics, sex hormones, and environmental changes. A number of recent studies on brain development have examined functional connectivity (FC) defined by the temporal correlation between time series of different brain regions. We propose to add the directional flow of information during brain maturation. To do so, we extract effective connectivity (EC) through Granger causality (GC) for two different groups of subjects, i.e., children and young adults. The motivation is that the inclusion of causal interaction may further discriminate brain connections between two age groups and help to discover new connections between brain regions. The contributions of this study are threefold. First, there has been a lack of attention to EC-based feature extraction in the context of brain development. To this end, we propose a new kernel-based GC (KGC) method to learn nonlinearity of complex brain network, where a reduced Sine hyperbolic polynomial (RSP) neural network was used as our proposed learner. Second, we used causality values as the weight for the directional connectivity between brain regions. Our findings indicated that the strength of connections was significantly higher in young adults relative to children. In addition, our new EC-based feature outperformed FC-based analysis from Philadelphia neurocohort (PNC) study with better discrimination of the different age groups. Moreover, the fusion of these two sets of features (FC + EC) improved brain age prediction accuracy by more than 4%, indicating that they should be used together for brain development studies.

[25]  arXiv:2001.08188 [pdf, other]
Title: Discovering Salient Anatomical Landmarks by Predicting Human Gaze
Comments: Accepted at IEEE International Symposium on Biomedical Imaging 2020 (ISBI 2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Anatomical landmarks are a crucial prerequisite for many medical imaging tasks. Usually, the set of landmarks for a given task is predefined by experts. The landmark locations for a given image are then annotated manually or via machine learning methods trained on manual annotations. In this paper, in contrast, we present a method to automatically discover and localize anatomical landmarks in medical images. Specifically, we consider landmarks that attract the visual attention of humans, which we term visually salient landmarks. We illustrate the method for fetal neurosonographic images. First, full-length clinical fetal ultrasound scans are recorded with live sonographer gaze-tracking. Next, a convolutional neural network (CNN) is trained to predict the gaze point distribution (saliency map) of the sonographers on scan video frames. The CNN is then used to predict saliency maps of unseen fetal neurosonographic images, and the landmarks are extracted as the local maxima of these saliency maps. Finally, the landmarks are matched across images by clustering the landmark CNN features. We show that the discovered landmarks can be used within affine image registration, with average landmark alignment errors between 4.1% and 10.9% of the fetal head long axis length.

[26]  arXiv:2001.08189 [pdf]
Title: Automatic phantom test pattern classification through transfer learning with deep neural networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Medical Physics (physics.med-ph)

Imaging phantoms are test patterns used to measure image quality in computer tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex) provides test patterns for estimating the task transfer function (TTF) or noise power spectrum (NPF) and simulates different patient sizes. Determining which image slices are suitable for analysis currently requires manual annotation of these patterns by an expert, as subtle defects may make an image unsuitable for measurement. We propose a method of automatically classifying these test patterns in a series of phantom images using deep learning techniques. By adapting a convolutional neural network based on the VGG19 architecture with weights trained on ImageNet, we use transfer learning to produce a classifier for this domain. The classifier is trained and evaluated with over 3,500 phantom images acquired at a university medical center. Input channels for color images are successfully adapted to convey contextual information for phantom images. A series of ablation studies are employed to verify design aspects of the classifier and evaluate its performance under varying training conditions. Our solution makes extensive use of image augmentation to produce a classifier that accurately classifies typical phantom images with 98% accuracy, while maintaining as much as 86% accuracy when the phantom is improperly imaged.

[27]  arXiv:2001.08202 [pdf]
Title: RDAnet: A Deep Learning Based Approach for Synthetic Aperture Radar Image Formation
Authors: Andrew Rittenbach (1), John Paul Walters (1) ((1) University of Southern California Information Sciences Institute, Arlington VA)
Comments: 8 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Synthetic Aperture Radar (SAR) imaging systems operate by emitting radar signals from a moving object, such as a satellite, towards the target of interest. Reflected radar echoes are received and later used by image formation algorithms to form a SAR image. There is great interest in using SAR images in computer vision tasks such as automatic target recognition. Today, however, SAR applications consist of multiple operations: image formation followed by image processing. In this work, we show that deep learning can be used to train a neural network able to form SAR images from echo data. Results show that our neural network, RDAnet, can form SAR images comparable to images formed using a traditional algorithm. This approach opens the possibility to end-to-end SAR applications where image formation and image processing are integrated into a single task. We believe that this work is the first demonstration of deep learning based SAR image formation using real data.

Cross-lists for Thu, 23 Jan 20

[28]  arXiv:2001.07715 (cross-list from cs.RO) [pdf, other]
Title: TEASER: Fast and Certifiable Point Cloud Registration
Comments: 20 pages main text, 22 pages appendix
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

We propose the first fast and certifiable algorithm for the registration of two sets of 3D points in the presence of large amounts of outlier correspondences. Towards this goal, we first reformulate the registration problem using a Truncated Least Squares (TLS) cost that makes the estimation insensitive to spurious correspondences. Then, we provide a general graph-theoretic framework to decouple scale, rotation, and translation estimation, which allows solving in cascade for the three transformations. Despite the fact that each subproblem is still non-convex and combinatorial in nature, we show that (i) TLS scale and (component-wise) translation estimation can be solved in polynomial time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed to a semidefinite program (SDP) and the relaxation is tight, even in the presence of extreme outlier rates. We name the resulting algorithm TEASER (Truncated least squares Estimation And SEmidefinite Relaxation). While solving large SDP relaxations is typically slow, we develop a second certifiable algorithm, named TEASER++, that circumvents the need to solve an SDP and runs in milliseconds. For both algorithms, we provide theoretical bounds on the estimation errors, which are the first of their kind for robust registration problems. Moreover, we test their performance on standard benchmarks, object detection datasets, and the 3DMatch scan matching dataset, and show that (i) both algorithms dominate the state of the art (e.g., RANSAC, branch-&-bound, heuristics) and are robust to more than 99% outliers, (ii) TEASER++ can run in milliseconds and it is currently the fastest robust registration algorithm, (iii) TEASER++ is so robust it can also solve problems without correspondences (e.g., hypothesizing all-to-all correspondences) where it largely outperforms ICP. We release a fast open-source C++ implementation of TEASER++.

[29]  arXiv:2001.07792 (cross-list from cs.CR) [pdf, other]
Title: GhostImage: Perception Domain Attacks against Vision-based Object Classification Systems
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In vision-based object classification systems, imaging sensors perceive the environment and then objects are detected and classified for decision-making purposes. Vulnerabilities in the perception domain enable an attacker to inject false data into the sensor which could lead to unsafe consequences. In this work, we focus on camera-based systems and propose GhostImage attacks, with the goal of either creating a fake perceived object or obfuscating the object's image that leads to wrong classification results. This is achieved by remotely projecting adversarial patterns into camera-perceived images, exploiting two common effects in optical imaging systems, namely lens flare/ghost effects, and auto-exposure control. To improve the robustness of the attack to channel perturbations, we generate optimal input patterns by integrating adversarial machine learning techniques with a trained end-to-end channel model. We realize GhostImage attacks with a projector, and conducted comprehensive experiments, using three different image datasets, in indoor and outdoor environments, and three different cameras. We demonstrate that GhostImage attacks are applicable to both autonomous driving and security surveillance scenarios. Experiment results show that, depending on the projector-camera distance, attack success rates can reach as high as 100%.

[30]  arXiv:2001.07847 (cross-list from eess.IV) [pdf, other]
Title: Anomaly detection in chest radiographs with a weakly supervised flow-based deep learning method
Authors: H. Shibata (1), S. Hanaoka (2), Y. Nomura (1), T. Nakao (3), I. Sato (2 and 4 and 5), N. Hayashi (1), O. Abe (2 and 3) ((1) Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, (2) Department of Radiology, The University of Tokyo Hospital, (3) Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, (4) Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, (5) Center for Advanced Intelligence Project, RIKEN)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Preventing the oversight of anomalies in chest X-ray radiographs (CXRs) during diagnosis is a crucial issue. Deep learning (DL)-based anomaly detection methods are rapidly growing in popularity, and provide effective solutions to the problem, but the workload in labeling CXRs during the training procedure remains heavy. To reduce the workload, a novel anomaly detection method for CXRs based on weakly supervised DL is presented in this study. The DL is based on a flow-based deep neural network (DNN) framework with which two normality metrics (logarithm likelihood and logarithm likelihood ratio) can be calculated. With this method, only one set of normal CXRs requires labeling to train the DNN, then the normality of any unknown CXR can be evaluated. The area under the receiver operation characteristic curve acquired with the logarithm likelihood ratio metric ($\approx0.783$) was greater than that obtained with the logarithm likelihood metric, and was a value comparable to those in previous studies where other weakly supervised DNNs were implemented.

[31]  arXiv:2001.08001 (cross-list from cs.LG) [pdf, ps, other]
Title: Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep learning methods are widely regarded as indispensable when it comes to designing perception pipelines for autonomous agents such as robots, drones or automated vehicles. The main reasons, however, for deep learning not being used for autonomous agents at large scale already are safety concerns. Deep learning approaches typically exhibit a black-box behavior which makes it hard for them to be evaluated with respect to safety-critical aspects. While there have been some work on safety in deep learning, most papers typically focus on high-level safety concerns. In this work, we seek to dive into the safety concerns of deep learning methods and present a concise enumeration on a deeply technical level. Additionally, we present extensive discussions on possible mitigation methods and give an outlook regarding what mitigation methods are still missing in order to facilitate an argumentation for the safety of a deep learning method.

[32]  arXiv:2001.08034 (cross-list from cs.CL) [pdf, other]
Title: ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Comments: AAAI 2020 (10 pages)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities: text, images, and tables. We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs. Our questions are ambiguous, in that the modality that contains the answer is not easily determined based solely upon the question. To demonstrate this ambiguity, we construct a modality selector (or disambiguator) network, and this model gets substantially lower accuracy on our challenge set, compared to existing datasets, indicating that our questions are more ambiguous. By analyzing this model, we investigate which words in the question are indicative of the modality. Next, we construct a simple baseline ManyModalQA model, which, based on the prediction from the modality selector, fires a corresponding pre-trained state-of-the-art unimodal QA model. We focus on providing the community with a new manymodal evaluation set and only provide a fine-tuning set, with the expectation that existing datasets and approaches will be transferred for most of the training, to encourage low-resource generalization without large, monolithic training sets for each new task. There is a significant gap between our baseline models and human performance; therefore, we hope that this challenge encourages research in end-to-end modality disambiguation and multimodal QA models, as well as transfer learning. Code and data available at: https://github.com/hannandarryl/ManyModalQA

[33]  arXiv:2001.08113 (cross-list from eess.IV) [pdf, other]
Title: DeepFL-IQA: Weak Supervision for Deep IQA Feature Learning
Comments: dataset url: this http URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Multi-level deep-features have been driving state-of-the-art methods for aesthetics and image quality assessment (IQA). However, most IQA benchmarks are comprised of artificially distorted images, for which features derived from ImageNet under-perform. We propose a new IQA dataset and a weakly supervised feature learning approach to train features more suitable for IQA of artificially distorted images. The dataset, KADIS-700k, is far more extensive than similar works, consisting of 140,000 pristine images, 25 distortions types, totaling 700k distorted versions. Our weakly supervised feature learning is designed as a multi-task learning type training, using eleven existing full-reference IQA metrics as proxies for differential mean opinion scores. We also introduce a benchmark database, KADID-10k, of artificially degraded images, each subjectively annotated by 30 crowd workers. We make use of our derived image feature vectors for (no-reference) image quality assessment by training and testing a shallow regression network on this database and five other benchmark IQA databases. Our method, termed DeepFL-IQA, performs better than other feature-based no-reference IQA methods and also better than all tested full-reference IQA methods on KADID-10k. For the other five benchmark IQA databases, DeepFL-IQA matches the performance of the best existing end-to-end deep learning-based methods on average.

[34]  arXiv:2001.08126 (cross-list from eess.IV) [pdf, other]
Title: Optimizing Generative Adversarial Networks for Image Super Resolution via Latent Space Regularization
Authors: Sheng Zhong, Shifu Zhou (Agora.io)
Comments: 11 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Natural images can be regarded as residing in a manifold that is embedded in a higher dimensional Euclidean space. Generative Adversarial Networks (GANs) try to learn the distribution of the real images in the manifold to generate samples that look real. But the results of existing methods still exhibit many unpleasant artifacts and distortions even for the cases where the desired ground truth target images are available for supervised learning such as in single image super resolution (SISR). We probe for ways to alleviate these problems for supervised GANs in this paper. We explicitly apply the Lipschitz Continuity Condition (LCC) to regularize the GAN. An encoding network that maps the image space to a new optimal latent space is derived from the LCC, and it is used to augment the GAN as a coupling component. The LCC is also converted to new regularization terms in the generator loss function to enforce local invariance. The GAN is optimized together with the encoding network in an attempt to make the generator converge to a more ideal and disentangled mapping that can generate samples more faithful to the target images. When the proposed models are applied to the single image super resolution problem, the results outperform the state of the art.

[35]  arXiv:2001.08142 (cross-list from cs.LG) [pdf, other]
Title: Pruning CNN's with linear filter ensembles
Comments: accepted to ECAI2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Despite the promising results of convolutional neural networks (CNNs), applying them on resource limited devices is still a challenge, mainly due to the huge memory and computation requirements. To tackle these problems, pruning can be applied to reduce the network size and number of floating point operations (FLOPs). Contrary to the \emph{filter norm} method -- that is used in network pruning and uses the assumption that the smaller this norm, the less important is the associated component --, we develop a novel filter importance norm that incorporates the loss caused by the elimination of a component from the CNN.
To estimate the importance of a set of architectural components, we measure the CNN performance as different components are removed. The result is a collection of filter ensembles -- filter masks -- and associated performance values. We rank the filters based on a linear and additive model and remove the least important ones such that the drop in network accuracy is minimal. We evaluate our method on a fully connected network, as well as on the ResNet architecture trained on the CIFAR-10 data-set. Using our pruning method, we managed to remove $60\%$ of the parameters and $64\%$ of the FLOPs from the ResNet with an accuracy drop of less than $0.6\%$.

Replacements for Thu, 23 Jan 20

[36]  arXiv:1711.07829 (replaced) [pdf]
Title: Discussion among Different Methods of Updating Model Filter in Object Tracking
Comments: 8 pages, 3 figures, SPIE 10th International Symposium on Multispectral Image Processing and Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[37]  arXiv:1711.07835 (replaced) [pdf]
Title: Robust Object Tracking Based on Self-adaptive Search Area
Comments: 10 pages, 4 figures, 3 tables, SPIE 10th International Symposium on Multispectral Image Processing and Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[38]  arXiv:1712.06326 (replaced) [pdf]
Title: Space-Filling Curve Indices as Acceleration Structure for Exemplar-Based Inpainting
Comments: submitted to Signal Processing: Image Communication
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[39]  arXiv:1805.05638 (replaced) [pdf, other]
Title: Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation
Comments: This version: 11 pages (12 with reference), 12 figures, 5 table; Version 1: 7 pages,7 figures, 4 tables; The paper for version 1 has been accepted by International Joint Conference on Artificial Intelligence (IJCAI),2018
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[40]  arXiv:1808.01047 (replaced) [pdf, other]
Title: A Data Dependent Multiscale Model for Hyperspectral Unmixing With Spectral Variability
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[41]  arXiv:1903.07933 (replaced) [pdf, other]
Title: What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction
Comments: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L) and for presentation at the 2020 International Conference on Robotics and Automation (ICRA)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[42]  arXiv:1906.01905 (replaced) [pdf, other]
Title: Baby steps towards few-shot learning with multiple semantics
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[43]  arXiv:1909.02410 (replaced) [pdf, other]
Title: Semantic-Aware Scene Recognition
Comments: Paper submitted for publication to Elsevier Pattern Recognition journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[44]  arXiv:1909.08605 (replaced) [pdf, other]
Title: Graduated Non-Convexity for Robust Spatial Perception: From Non-Minimal Solvers to Global Outlier Rejection
Comments: 10 pages, 5 figures, published at IEEE Robotics and Automation Letters (RA-L), 2020
Journal-ref: IEEE Robotics and Automation Letters (RA-L), 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Optimization and Control (math.OC)
[45]  arXiv:1909.09931 (replaced) [pdf, other]
Title: Volume Preserving Image Segmentation with Entropic Regularization Optimal Transport and Its Applications in Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[46]  arXiv:1909.10147 (replaced) [pdf, other]
Title: Robust Local Features for Improving the Generalization of Adversarial Training
Comments: ICLR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[47]  arXiv:1911.01577 (replaced) [pdf, other]
Title: Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[48]  arXiv:1911.12425 (replaced) [pdf, other]
Title: Learning with less data via Weakly Labeled Patch Classification in Digital Pathology
Comments: To appear in IEEE International Symposium on Biomedical Imaging (ISBI) 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[49]  arXiv:2001.00346 (replaced) [pdf, other]
Title: First image then video: A two-stage network for spatiotemporal video denoising
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[50]  arXiv:2001.01037 (replaced) [pdf, other]
Title: Understanding Image Captioning Models beyond Visualizing Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[51]  arXiv:2001.05643 (replaced) [pdf, other]
Title: PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[52]  arXiv:2001.05868 (replaced) [pdf, ps, other]
Title: Filter Grafting for Deep Neural Networks
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[53]  arXiv:1903.08111 (replaced) [pdf, other]
Title: Preconditioned P-ULA for Joint Deconvolution-Segmentation of Ultrasound Images -- Extended Version
Journal-ref: In IEEE Signal Processing Letters, vol.26, no.10, pp.1456-1460 (2019)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[54]  arXiv:1910.04961 (replaced) [pdf, other]
Title: Adversarial Pulmonary Pathology Translation for Pairwise Chest X-ray Data Augmentation
Comments: Code: this https URL - Accepted to the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2019
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[55]  arXiv:2001.00138 (replaced) [pdf, other]
Title: PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
Comments: To be published in the Proceedings of Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 20)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
[56]  arXiv:2001.05887 (replaced) [pdf, other]
Title: MixPath: A Unified Approach for One-shot Neural Architecture Search
Comments: Bridge the gap between one shot NAS and multi branch using shadow BN. Model Ranking is good
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[57]  arXiv:2001.07183 (replaced) [pdf, other]
Title: Learning Deformable Registration of Medical Images with Anatomical Constraints
Comments: Accepted for publication in Neural Networks (Elsevier). Source code and resulting segmentation masks for the NIH Chest-XRay14 dataset with estimated quality index available at this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[ total of 57 entries: 1-57 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2001, contact, help  (Access key information)