We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 29 May 20

[1]  arXiv:2005.13595 [pdf, ps, other]
Title: Characterizing Quality of Experience for Demand Management in South Brazil
Subjects: Systems and Control (eess.SY)

The present work delivers the results of a survey conducted in Florian\'{o}polis, on the southern region of Brazil, using a digital questionnaire which inquired about the interaction with demand management systems (DMSs) in a smart house context. In particular, the survey addressed the interviewees' thoughts on demand management and gathered data to enable the design of Quality of Experience (QoE) based demand management policies. The investigation of QoE-aware approaches is significant as it enables the DMS to take decisions not only based on economic metrics, but also taking the discomfort caused to the users into account. The number of responses guaranteed a confidence level of 95\% and an error margin of 4.63\% and the content of such responses showed favorable disposition of the interviewees on allowing interventions of DMSs on their energy consumption habits as long as it reduces their expenses with energy. The data concerning discomfort caused by demand management actions on several home appliances was treated using clustering techniques and different typical user profiles were leveraged, represented by the centroid of each cluster. Those profiles can be used to deploy QoE-aware DMSs tailored for the local reality, which seems as a promising business model as smart houses become a reality.

[2]  arXiv:2005.13609 [pdf, other]
Title: Hybrid Voltage Stability and Security Assessment using Synchrophasors with Consideration of Generator Q-limits
Subjects: Systems and Control (eess.SY)

Increased power demands, push for economics and limited investment in grid infrastructure have led utilities to operatepower systems closer to their stability limits. Voltage instability may trigger cascade tripping, wide-area voltage collapse and powerblackouts. Real-time voltage stability monitoring possible with deployment of phasor measurement units (PMUs) is essential totake proactive control actions and minimize the impact on system. This paper presents a novel online algorithm for a) hybridperturbation analysis-based voltage stability monitoring (HPVSM), b) including Q-limit in voltage stability index and c) real timesecurity analysis using voltage stability index. HPVSM based voltage stability index is computed using the data obtained fromlinear state estimator and PMU measurements. Typically measurement-based schemes ignore the impact of generator Q-limitsand security analysis is not feasible. The proposed HPVSM based index considers the impact of generator Q-limits violationsby anticipating the critical generators using real-time PMU measurements. Contingencies are ranked using the proposed voltagestability index for security analysis. Results simulated for the 9-bus WECC, IEEE 14, 57 and 118 bus systems highlights thesuperiority of the proposed method in real time voltage stability and security analysis

[3]  arXiv:2005.13616 [pdf, other]
Title: Modality Dropout for Improved Performance-driven Talking Faces
Comments: Pre-print
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, video-only, and audiovisual input features. The probability of dropping a modality allows control over the degree to which the model exploits audio and visual information during training. Our trained model runs in real-time on resource limited hardware (e.g.\ a smart phone), it is user agnostic, and it is not dependent on a potentially error-prone transcription of the speech. We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout. Before introducing dropout, viewers prefer audiovisual-driven animation in 51% of the test sequences compared with only 18% for video-driven. After introducing dropout viewer preference for audiovisual-driven animation increases to 74%, but decreases to 8% for video-only.

[4]  arXiv:2005.13643 [pdf, ps, other]
Title: Segmentation of the Myocardium on Late-Gadolinium Enhanced MRI based on 2.5 D Residual Squeeze and Excitation Deep Learning Model
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cardiac left ventricular (LV) segmentation from short-axis MRI acquired 10 minutes after the injection of a contrast agent (LGE-MRI) is a necessary step in the processing allowing the identification and diagnosis of cardiac diseases such as myocardial infarction. However, this segmentation is challenging due to high variability across subjects and the potential lack of contrast between structures. Then, the main objective of this work is to develop an accurate automatic segmentation method based on deep learning models for the myocardial borders on LGE-MRI. To this end, 2.5 D residual neural network integrated with a squeeze and excitation blocks in encoder side with specialized convolutional has been proposed. Late fusion has been used to merge the output of the best trained proposed models from a different set of hyperparameters. A total number of 320 exams (with a mean number of 6 slices per exam) were used for training and 28 exams used for testing. The performance analysis of the proposed ensemble model in the basal and middle slices was similar as compared to intra-observer study and slightly lower at apical slices. The overall Dice score was 82.01% by our proposed method as compared to Dice score of 83.22% obtained from the intra observer study. The proposed model could be used for the automatic segmentation of myocardial border that is a very important step for accurate quantification of no-reflow, myocardial infarction, myocarditis, and hypertrophic cardiomyopathy, among others.

[5]  arXiv:2005.13690 [pdf, other]
Title: Multiple resolution residual network for automatic thoracic organs-at-risk segmentation from CT
Comments: MIDL 2020 short paper
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We implemented and evaluated a multiple resolution residual network (MRRN) for multiple normal organs-at-risk (OAR) segmentation from computed tomography (CT) images for thoracic radiotherapy treatment (RT) planning. Our approach simultaneously combines feature streams computed at multiple image resolutions and feature levels through residual connections. The feature streams at each level are updated as the images are passed through various feature levels. We trained our approach using 206 thoracic CT scans of lung cancer patients with 35 scans held out for validation to segment the left and right lungs, heart, esophagus, and spinal cord. This approach was tested on 60 CT scans from the open-source AAPM Thoracic Auto-Segmentation Challenge dataset. Performance was measured using the Dice Similarity Coefficient (DSC). Our approach outperformed the best-performing method in the grand challenge for hard-to-segment structures like the esophagus and achieved comparable results for all other structures. Median DSC using our method was 0.97 (interquartile range [IQR]: 0.97-0.98) for the left and right lungs, 0.93 (IQR: 0.93-0.95) for the heart, 0.78 (IQR: 0.76-0.80) for the esophagus, and 0.88 (IQR: 0.86-0.89) for the spinal cord.

[6]  arXiv:2005.13694 [pdf, other]
Title: Learning Secured Modulation With Deep Adversarial Neural Networks
Comments: This Paper is accepted in IEEE VTC2020-Fall
Subjects: Signal Processing (eess.SP)

Growing interest in utilizing the wireless spectrum by heterogeneous devices compels us to rethink the physical layer security to protect the transmitted waveform from an eavesdropper. We propose an end-to-end symmetric key neural encryption and decryption algorithm with a modulation technique, which remains undeciphered by an eavesdropper, equipped with the same neural network and trained on the same dataset as the intended users. We solve encryption and modulation as a joint problem for which we map the bits to complex analog signals, without adhering to any particular encryption algorithm or modulation technique. We train to cooperatively learn encryption and decryption algorithms between our trusted pair of neural networks, while eavesdropper's model is trained adversarially on the same data to minimize the error. We introduce a discrete activation layer with a defined gradient to combat noise in a lossy channel. Our results show that a trusted pair of users can exchange data bits in both clean and noisy channels, where a trained adversary cannot decipher the data.

[7]  arXiv:2005.13695 [pdf, other]
Title: An ENAS Based Approach for Constructing Deep Learning Models for Breast Cancer Recognition from Ultrasound Images
Comments: 6 pages, 3 figures, Conference: Medical Imaging with Deep Learning 2020
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep Convolutional Neural Networks (CNN) provides an "end-to-end" solution for image pattern recognition with impressive performance in many areas of application including medical imaging. Most CNN models of high performance use hand-crafted network architectures that require expertise in CNNs to utilise their potentials. In this paper, we applied the Efficient Neural Architecture Search (ENAS) method to find optimal CNN architectures for classifying breast lesions from ultrasound (US) images. Our empirical study with a dataset of 524 US images shows that the optimal models generated by using ENAS achieve an average accuracy of 89.3%, surpassing other hand-crafted alternatives. Furthermore, the models are simpler in complexity and more efficient. Our study demonstrates that the ENAS approach to CNN model design is a promising direction for classifying ultrasound images of breast lesions.

[8]  arXiv:2005.13740 [pdf, ps, other]
Title: On BT-limited Signals
Authors: Xiang-Gen Xia
Subjects: Signal Processing (eess.SP)

In this letter, we introduce and characterize a subspace of bandlimited signals. The subspace consists of all $\Omega$ bandlimited signals such that the non-zero parts of their Fourier transforms are pieces of some $T$ bandlimited signals. The signals in the subspace are called {\em BT-limited signals} and the subspace is named as BT-limited signal space. For BT-limited signals, a signal extrapolation with an analytic error estimate exists outside the interval $[-T, T]$ of given signal values with errors.

[9]  arXiv:2005.13769 [pdf, other]
Title: Unsupervised Audio Source Separation using Generative Priors
Comments: 5 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)

State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g. WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.

[10]  arXiv:2005.13770 [pdf, other]
Title: DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Multimedia (cs.MM); Sound (cs.SD)

With the recent advances in voice synthesis such as WaveNet, AI-synthesized fake voices are indistinguishable to human ears and widely applied for producing realistic and natural DeepFakes which are real threats to everyone. However, effective and robust detectors for synthesized fake voices are still in their infancy and are not ready to fully tackle this emerging threat. In this paper, we devise a novel approach, named DeepSonar, based on monitoring neuron behaviors of speaker recognition (SR) system, a deep neural network (DNN), to discern AI-synthesized fake voices. Layer-wise neuron behaviors provide an important insight to hunt the differences among inputs, which are widely employed for building safety, robust and interpretable DNNs. In this work, we leverage the power of layer-wise neuron activation patterns with a conjecture that they can capture the subtle differences between real and AI-synthesized fake voices and provide a cleaner signal to classifiers than raw inputs. Experiments are conducted in three datasets (including commercial products from Google, Baidu, etc.) containing both English and Chinese languages to corroborate the high detection rates (98.1% average accuracy) and low false alarm rates (0.02 equal error rate) of DeepSonar in discerning fake voices. Furthermore, extensive experiment results show its robustness against manipulation attacks (e.g., voice conversion and additive real-world noises). Our work also poses a new insight into adopting neuron behaviors for effective and robust AI aided multimedia fakes forensics instead of motivated by various artifacts introduced in fakes.

[11]  arXiv:2005.13781 [pdf, other]
Title: A Maneuver-based Urban Driving Dataset and Model for Cooperative Vehicle Applications
Subjects: Signal Processing (eess.SP); Robotics (cs.RO)

Short-term future of automated driving can be imagined as a hybrid scenario in which both automated and human-driven vehicles co-exist in the same environment. In order to address the needs of such road configuration, many technology solutions such as vehicular communication and predictive control for automated vehicles have been introduced in the literature. Both aforementioned solutions rely on driving data of the human driver. In this work, we investigate the currently available driving datasets and introduce a real-world maneuver-based driving dataset that is collected during our urban driving data collection campaign. We also provide a model that embeds the patterns in maneuver-specific samples. Such model can be employed for classification and prediction purposes.

[12]  arXiv:2005.13835 [pdf, other]
Title: Speech-to-Singing Conversion based on Boundary Equilibrium GAN
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

This paper investigates the use of generative adversarial network (GAN)-based models for converting the spectrogram of a speech signal into that of a singing one, without reference to the phoneme sequence underlying the speech. This is achieved by viewing speech-to-singing conversion as a style transfer problem. Specifically, given a speech input, and optionally the F0 contour of the target singing, the proposed model generates as the output a singing signal with a progressive-growing encoder/decoder architecture and boundary equilibrium GAN loss functions. Our quantitative and qualitative analysis show that the proposed model generates singing voices with much higher naturalness than an existing non adversarially-trained baseline. For reproducibility, the code will be publicly available at a GitHub repository upon paper publication.

[13]  arXiv:2005.13877 [pdf, other]
Title: The optimal sequence for reset controllers
Subjects: Systems and Control (eess.SY)

PID controllers cannot satisfy the high performance requirements since they are restricted by the water-bed effect. Thus, the need for a better alternative to linear PID controllers increases due to the rising demands of the high-tech industry. This has led many researchers to explore nonlinear controllers like reset control. Although reset controllers have been widely used to overcome the limitations of linear controllers in literature, the performance of the system varies depending on the relative sequence of controller linear and nonlinear parts. In this paper, the optimal sequence is found using high order sinusoidal input describing functions (HOSIDF). By arranging controller parts according to this strategy, better performance in the sense of precision and control input is achieved. The performance of the proposed sequence is validated on a precision positioning setup. The experimental results demonstrate that the optimal sequence found in theory outperforms other sequences.

[14]  arXiv:2005.13895 [pdf, other]
Title: When Can Self-Attention Be Replaced by Feed Forward Layers?
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability to capture temporal relationships without being limited by the distance between two related events. However, we note that the range of the learned context progressively increases from the lower to upper self-attention layers, whilst acoustic events often happen within short time spans in a left-to-right order. This leads to a question: for speech recognition, is a global view of the entire sequence still important for the upper self-attention layers in the encoder of Transformers? To investigate this, we replace these self-attention layers with feed forward layers. In our speech recognition experiments (Wall Street Journal and Switchboard), we indeed observe an interesting result: replacing the upper self-attention layers in the encoder with feed forward layers leads to no performance drop, and even minor gains. Our experiments offer insights to how self-attention layers process the speech signal, leading to the conclusion that the lower self-attention layers of the encoder encode a sufficiently wide range of inputs, hence learning further contextual information in the upper layers is unnecessary.

[15]  arXiv:2005.13899 [pdf, other]
Title: Deep Learning for Automatic Pneumonia Detection
Comments: to appear in CVPR 2020 Workshops proceedings
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Pneumonia is the leading cause of death among young children and one of the top mortality causes worldwide. The pneumonia detection is usually performed through examine of chest X-ray radiograph by highly-trained specialists. This process is tedious and often leads to a disagreement between radiologists. Computer-aided diagnosis systems showed the potential for improving diagnostic accuracy. In this work, we develop the computational approach for pneumonia regions detection based on single-shot detectors, squeeze-and-excitation deep convolution neural networks, augmentations and multi-task learning. The proposed approach was evaluated in the context of the Radiological Society of North America Pneumonia Detection Challenge, achieving one of the best results in the challenge.

[16]  arXiv:2005.13928 [pdf, other]
Title: Early Screening of SARS-CoV-2 by Intelligent Analysis of X-Ray Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Future SARS-CoV-2 virus outbreak COVID-XX might possibly occur during the next years. However the pathology in humans is so recent that many clinical aspects, like early detection of complications, side effects after recovery or early screening, are currently unknown. In spite of the number of cases of COVID-19, its rapid spread putting many sanitary systems in the edge of collapse has hindered proper collection and analysis of the data related to COVID-19 clinical aspects. We describe an interdisciplinary initiative that integrates clinical research, with image diagnostics and the use of new technologies such as artificial intelligence and radiomics with the aim of clarifying some of SARS-CoV-2 open questions. The whole initiative addresses 3 main points: 1) collection of standardize data including images, clinical data and analytics; 2) COVID-19 screening for its early diagnosis at primary care centers; 3) define radiomic signatures of COVID-19 evolution and associated pathologies for the early treatment of complications. In particular, in this paper we present a general overview of the project, the experimental design and first results of X-ray COVID-19 detection using a classic approach based on HoG and feature selection. Our experiments include a comparison to some recent methods for COVID-19 screening in X-Ray and an exploratory analysis of the feasibility of X-Ray COVID-19 screening. Results show that classic approaches can outperform deep-learning methods in this experimental setting, indicate the feasibility of early COVID-19 screening and that non-COVID infiltration is the group of patients most similar to COVID-19 in terms of radiological description of X-ray. Therefore, an efficient COVID-19 screening should be complemented with other clinical data to better discriminate these cases.

[17]  arXiv:2005.13981 [pdf]
Title: The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results
Comments: Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2001.08662
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. While the performance is good on the synthetic test set, often the model performance degrades significantly on real recordings. Also, most of the conventional objective metrics do not correlate well with subjective tests and lab subjective tests are not scalable for a large test set. In this challenge, we open-sourced a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings. We also open-sourced an online subjective test framework based on ITU-T P.808 for researchers to reliably test their developments. We evaluated the results using P.808 on a blind test set. The results and the key learnings from the challenge are discussed. The datasets and scripts can be found here for quick access https://github.com/microsoft/DNS-Challenge.

[18]  arXiv:2005.13987 [pdf, other]
Title: A deep learning-based pipeline for error detection and quality control of brain MRI segmentation results
Comments: 5 pages, 2 figures, to be included into the arXiv compendium of the conference MIDL 2020
Subjects: Image and Video Processing (eess.IV)

Brain MRI segmentation results should always undergo a quality control (QC) process, since automatic segmentation tools can be prone to errors. In this work, we propose two deep learning-based architectures for performing QC automatically. First, we used generative adversarial networks for creating error maps that highlight the locations of segmentation errors. Subsequently, a 3D convolutional neural network was implemented to predict segmentation quality. The present pipeline was shown to achieve promising results and, in particular, high sensitivity in both tasks.

[19]  arXiv:2005.13992 [pdf]
Title: A Novel Ramp Metering Approach Based on Machine Learning and Historical Data
Comments: 5 pages, 11 figures, 2 tables
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

The random nature of traffic conditions on freeways can cause excessive congestions and irregularities in the traffic flow. Ramp metering is a proven effective method to maintain freeway efficiency under various traffic conditions. Creating a reliable and practical ramp metering algorithm that considers both critical traffic measures and historical data is still a challenging problem. In this study we use machine learning approaches to develop a novel real-time prediction model for ramp metering. We evaluate the potentials of our approach in providing promising results by comparing it with a baseline traffic-responsive ramp metering algorithm.

[20]  arXiv:2005.14003 [pdf, other]
Title: Hybrid data and model driven algorithms for angular power spectrum estimation
Subjects: Signal Processing (eess.SP)

We propose two algorithms that use both models and datasets to estimate angular power spectra from channel covariance matrices in massive MIMO systems. The first algorithm is an iterative fixed-point method that solves a hierarchical problem. It uses model knowledge to narrow down candidate angular power spectra to a set that is consistent with a measured covariance matrix. Then, from this set, the algorithm selects the angular power spectrum with minimum distance to its expected value with respect to a Hilbertian metric learned from data. The second algorithm solves an alternative optimization problem with a single application of a solver for nonnegative least squares programs. By fusing information obtained from datasets and models, both algorithms can outperform existing approaches based on models, and they are also robust against environmental changes and small datasets.

[21]  arXiv:2005.14017 [pdf, other]
Title: A Normalized Fully Convolutional Approach to Head and Neck Cancer Outcome Prediction
Comments: 6 pages, 1 figure, 1 table, Medical Imaging with Deep Learning 2020 conference
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In medical imaging, radiological scans of different modalities serve to enhance different sets of features for clinical diagnosis and treatment planning. This variety enriches the source information that could be used for outcome prediction. Deep learning methods are particularly well-suited for feature extraction from high-dimensional inputs such as images. In this work, we apply a CNN classification network augmented with a FCN preprocessor sub-network to a public TCIA head and neck cancer dataset. The training goal is survival prediction of radiotherapy cases based on pre-treatment FDG PET-CT scans, acquired across 4 different hospitals. We show that the preprocessor sub-network in conjunction with aggregated residual connection leads to improvements over state-of-the-art results when combining both CT and PET input images.

[22]  arXiv:2005.14022 [pdf, other]
Title: Differentiation of Internal Faults in Power Transformers using Decision Tree based Classifiers
Subjects: Signal Processing (eess.SP)

This paper proposes a Decision Tree (DT) based classification of internal faults in a power transformer. The faults are simulated in Power System Computer Aided Design (PSCAD)/ Electromagnetic Transients including DC (EMTDC) by varying the fault resistance, fault inception angle, and percentage of winding under fault. 1146 features are extracted from the differential currents in phases a, b, and c belonging to the time, and frequency domains. Out of these, 3 most relevant features are selected to distinguish the internal faults in the primary and secondary of the transformer. DT, Random Forest (RF), and Gradient Boost (GB) classifiers are used to determine the fault types. The results show that the GB classifier performed the best among the three classifiers considered.

[23]  arXiv:2005.14036 [pdf, other]
Title: Image Restoration from Parametric Transformations using Generative Models
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)

When images are statistically described by a generative model we can use this information to develop optimum techniques for various image restoration problems as inpainting, super-resolution, image coloring, generative model inversion, etc. With the help of the generative model it is possible to formulate, in a natural way, these restoration problems as Statistical estimation problems. Our approach, by combining maximum a-posteriori probability with maximum likelihood estimation, is capable of restoring images that are distorted by transformations even when the latter contain unknown parameters. This must be compared with the current state of the art which requires exact knowledge of the transformations. We should also mention that our method does not contain any regularizer terms with unknown weights that need to be properly selected, as is common practice in all recent generative image restoration techniques. Finally, we extend our method to accommodate combinations of multiple images where each image is described by its own generative model and the participating images are being separated from a single combination.

[24]  arXiv:2005.14039 [pdf]
Title: 3D logic cells design and results based on Vertical NWFET technology including tied compact model
Comments: Paper submitted to IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), 5-7 October 2020, Salt Lake City (UT), USA
Subjects: Systems and Control (eess.SY); Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)

Gate-all-around Vertical Nanowire Field Effect Transistors (VNWFET) are emerging devices, which are well suited to pursue scaling beyond lateral scaling limitations around 7nm. This work explores the relative merits and drawbacks of the technology in the context of logic cell design. We describe a junctionless nanowire technology and associated compact model, which accurately describes fabricated device behavior in all regions of operations for transistors based on between 16 and 625 parallel nanowires of diameters between 22 and 50nm. We used this model to simulate the projected performance of inverter logic gates based on passive load, active load and complementary topologies and carry out an performance exploration for the number of nanowires in transistors. In terms of compactness, through a dedicated full 3D layout design, we also demonstrate a 1.4x reduction in lateral dimensions for the complementary structure with respect to 7nm FinFET-based inverters.

[25]  arXiv:2005.14064 [pdf, ps, other]
Title: Codebook-Based Beam Tracking for Conformal ArrayEnabled UAV MmWave Networks
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Millimeter wave (mmWave) communications can potentially meet the high data-rate requirements of unmanned aerial vehicle (UAV) networks. However, as the prerequisite of mmWave communications, the narrow directional beam tracking is very challenging because of the three-dimensional (3D) mobility and attitude variation of UAVs. Aiming to address the beam tracking difficulties, we propose to integrate the conformal array (CA) with the surface of each UAV, which enables the full spatial coverage and the agile beam tracking in highly dynamic UAV mmWave networks. More specifically, the key contributions of our work are three-fold. 1) A new mmWave beam tracking framework is established for the CA-enabled UAV mmWave network. 2) A specialized hierarchical codebook is constructed to drive the directional radiating element (DRE)-covered cylindrical conformal array (CCA), which contains both the angular beam pattern and the subarray pattern to fully utilize the potential of the CA. 3) A codebook-based multiuser beam tracking scheme is proposed, where the Gaussian process machine learning enabled UAV position/attitude predication is developed to improve the beam tracking efficiency in conjunction with the tracking-error aware adaptive beamwidth control. Simulation results validate the effectiveness of the proposed codebook-based beam tracking scheme in the CA-enabled UAV mmWave network, and demonstrate the advantages of CA over the conventional planner array in terms of spectrum efficiency and outage probability in the highly dynamic scenarios.

[26]  arXiv:2005.14082 [pdf, other]
Title: A Vision to Smart Radio Environment: Surface Wave Communication Superhighways
Comments: 7 pages, 6 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Complementary to traditional approaches that focus on transceiver design for bringing the best out of unstable, lossy fading channels, one radical development in wireless communications that has recently emerged is to pursue a smart radio environment by using software-defined materials or programmable metasurfaces for establishing favourable propagation conditions. This article portraits a vision of communication superhighways enabled by surface wave (SW) propagation on "smart surfaces" for future smart radio environments. The concept differs from the mainstream efforts of using passive elements on a large surface for bouncing off radio waves intelligently towards intended user terminals. In this vision, energy efficiency will be ultra-high, due to much less pathloss compared to free space propagation, and the fact that SW is inherently confined to the smart surface not only greatly simplifies the task of interference management, but also makes possible exceptionally localized high-speed interference-free data access. We shall outline the opportunities and associated challenges arisen from the SW paradigm. We shall also attempt to shed light on several key enabling technologies that make this realizable. One important technology which will be discussed is a software-controlled fluidic waveguiding architecture that permits dynamic creation of high-throughput data highways.

[27]  arXiv:2005.14115 [pdf]
Title: Amark: Automated Marking and Processing Techniques for Ambulatory ECG Data
Comments: 14 pages, 7 tables
Subjects: Signal Processing (eess.SP)

We describe techniques and specifications of MATLAB software to process ambulatory electrocardiogram (ECG) data. Through template-based beat identification and simple pattern recognition models on the intervals between regular heart beats, we filter noisy sections of waveform and ectopic beats. Our end-to-end process can be used towards analysis of ECG and calculation of heart rate variability metrics after beat adjustments, removals and interpolation. Classification and noise detection is assessed on the human-annotated MIT-BIH Arrythmia and Noise Stress Test Databases.

[28]  arXiv:2005.14117 [pdf, other]
Title: Knowledge-Driven Learning via Experts Consult for Thyroid Nodule Classification
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Computer-aided diagnosis (CAD) is becoming a prominent approach to assist clinicians spanning across multiple fields. These automated systems take advantage of various computer vision (CV) procedures, as well as artificial intelligence (AI) techniques, so that a diagnosis of a given image (e.g., computed tomography and ultrasound) can be formulated. Advances in both areas (CV and AI) are enabling ever increasing performances of CAD systems, which can ultimately avoid performing invasive procedures such as fine-needle aspiration. In this study, we focus on thyroid ultrasonography to present a novel knowledge-driven classification framework. The proposed system leverages cues provided by an ensemble of experts, in order to guide the learning phase of a densely connected convolutional network (DenseNet). The ensemble is composed by various networks pretrained on ImageNet, including AlexNet, ResNet, VGG, and others, so that previously computed feature parameters could be used to create ultrasonography domain experts via transfer learning, decreasing, moreover, the number of samples required for training. To validate the proposed method, extensive experiments were performed, providing detailed performances for both the experts ensemble and the knowledge-driven DenseNet. The obtained results, show how the the proposed system can become a great asset when formulating a diagnosis, by leveraging previous knowledge derived from a consult.

[29]  arXiv:2005.14181 [pdf, other]
Title: Bayesian Restoration of Audio Degraded by Low-Frequency Pulses Modeled via Gaussian Process
Comments: 10 pages, 4 figures, 2 tables. Submitted to IEEE Journal of Selected Topics in Signal Processing - Special Issue "Reconstruction of audio from incomplete or highly degraded observations"
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)

A common defect found when reproducing old vinyl and gramophone recordings with mechanical devices are the long pulses with significant low-frequency content caused by the interaction of the arm-needle system with deep scratches or even breakages on the media surface. Previous approaches to their suppression on digital counterparts of the recordings depend on a prior estimation of the pulse location, usually performed via heuristic methods. This paper proposes a novel Bayesian approach capable of jointly estimating the pulse location; interpolating the almost annihilated signal underlying the strong discontinuity that initiates the pulse; and also estimating the long pulse tail by a simple Gaussian Process, allowing its suppression from the corrupted signal. The posterior distribution for the model parameters as well for the pulse is explored via Markov-Chain Monte Carlo (MCMC) algorithms. Controlled experiments indicate that the proposed method, while requiring significantly less user intervention, achieves perceptual results similar to those of previous approaches and performs well when dealing with naturally degraded signals.

Cross-lists for Fri, 29 May 20

[30]  arXiv:1803.05702 (cross-list from cs.IT) [pdf, other]
Title: Achieving Spatial Scalability for Coded Caching over Wireless Networks
Comments: 30 pages, 9 figures
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

The coded caching scheme proposed by Maddah-Ali and Niesen considers the delivery of files in a given content library to users through a deterministic error-free network where a common multicast message is sent to all users at a fixed rate, independent of the number of users. In order to apply this paradigm to a wireless network, it is important to make sure that the common multicast rate does not vanish as the number of users increases. This paper focuses on a variant of coded caching successively proposed for the so-called combination network, where the multicast message is further encoded by a Maximum Distance Separable (MDS) code and the MDS-coded blocks are simultaneously transmitted from different Edge Nodes (ENs) (e.g., base stations or access points). Each user is equipped with multiple antennas and can select to decode a desired number of EN transmissions, while either nulling of treating as noise the others, depending on their strength. The system is reminiscent of the so-called evolved Multimedia Broadcast Multicast Service (eMBMS), in the sense that the fundamental underlying transmission mechanism is multipoint multicasting, where each user can independently and individually (in a user-centric manner) decide which EN to decode, without any explicit association of users to ENs. We study the performance of the proposed system when users and ENs are distributed according to homogeneous Poisson Point Processes in the plane and the propagation is affected by Rayleigh fading and distance dependent pathloss. Our analysis allows the system optimization with respect to the MDS coding rate. Also, we show that the proposed system is fully scalable, in the sense that it can support an arbitrarily large number of users, while maintaining a non-vanishing per-user delivery rate.

[31]  arXiv:2005.13594 (cross-list from cs.NE) [pdf]
Title: Antenna Optimization Using a New Evolutionary Algorithm Based on Tukey-Lambda Probability Distribution
Comments: 5 pages, to be submitted to IEEE ACCESS
Subjects: Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Applied Physics (physics.app-ph)

In this paper, we introduce a new evolutionary optimization algorithm based on Tukey's symmetric lambda distribution. Tukey distribution is defined by 3 parameters, the shape parameter, the scale parameter, and the location parameter or average value. Various other distributions can be approximated by changing the shape parameter, and as a result can encompass a large class of probability distributions. In addition, Because of these attributes, an Evolutionary Programming (EP) algorithm with Tukey mutation operator may perform well in a large class of optimization problems. Various schemes in implementation of EP with Tukey distribution are discussed, and the resulting algorithms are applied to selected test functions and antenna design problems.

[32]  arXiv:2005.13600 (cross-list from cs.HC) [pdf]
Title: Eye Gaze Controlled Interfaces for Head Mounted and Multi-Functional Displays in Military Aviation Environment
Comments: Presented at IEEE Aerospace 2020
Subjects: Human-Computer Interaction (cs.HC); Image and Video Processing (eess.IV)

Eye gaze controlled interfaces allow us to directly manipulate a graphical user interface just by looking at it. This technology has great potential in military aviation, in particular, operating different displays in situations where pilots hands are occupied with flying the aircraft. This paper reports studies on analyzing accuracy of eye gaze controlled interface inside aircraft undertaking representative flying missions. We reported that pilots can undertake representative pointing and selection tasks at less than 2 secs on average. Further, we evaluated the accuracy of eye gaze tracking glass under various G-conditions and analyzed its failure modes. We observed that the accuracy of an eye tracker is less than 5 degree of visual angle up to +3G, although it is less accurate at minus 1G and plus 5G. We observed that eye tracker may fail to track under higher external illumination. We also infer that an eye tracker to be used in military aviation need to have larger vertical field of view than the present available systems. We used this analysis to develop eye gaze trackers for Multi-Functional displays and Head Mounted Display System. We obtained significant reduction in pointing and selection times using our proposed HMDS system compared to traditional TDS.

[33]  arXiv:2005.13605 (cross-list from cs.CV) [pdf, other]
Title: D2D: Keypoint Extraction with Describe to Detect Approach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or detect and describe jointly are two typical strategies for extracting local descriptors. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. % Describe-to-Detect (D2D) leverages successful descriptor models without the need for any additional training. Our method selects keypoints as salient locations with high information content which is defined by the descriptors rather than some independent operators. We perform experiments on multiple benchmarks including image matching, camera localisation, and 3D reconstruction. The results indicate that our method improves the matching performance of various descriptors and that it generalises across methods and tasks.

[34]  arXiv:2005.13681 (cross-list from cs.CL) [pdf, other]
Title: Phone Features Improve Speech Translation
Comments: Accepted to ACL2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation. Their performance is often assumed to be superior, though in many conditions this is not yet the case. We compare cascaded and end-to-end models across high, medium, and low-resource conditions, and show that cascades remain stronger baselines. Further, we introduce two methods to incorporate phone features into ST models. We show that these features improve both architectures, closing the gap between end-to-end models and cascades, and outperforming previous academic work -- by up to 9 BLEU on our low-resource setting.

[35]  arXiv:2005.13738 (cross-list from cs.CR) [pdf, other]
Title: Model-Based Risk Assessment for Cyber Physical Systems Security
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Traditional techniques for Cyber-Physical Systems (CPS) security design either treat the cyber and physical systems independently, or do not address the specific vulnerabilities of real time embedded controllers and networks used to monitor and control physical processes. In this work, we develop and test an integrated model-based approach for CPS security risk assessment utilizing a CPS testbed with real-world industrial controllers and communication protocols. The testbed monitors and controls an exothermic Continuous Stirred Tank Reactor (CSTR) simulated in real-time. CSTR is a fundamental process unit in many industries, including Oil \& Gas, Petrochemicals, Water treatment, and nuclear industry. In addition, the process is rich in terms of hazardous scenarios that could be triggered by cyber attacks due to the lack of possible mechanical protection. The paper presents an integrated approach to analyze and design the cyber security system for a given CPS where the physical threats are identified first to guide the risk assessment process. A mathematical model is derived for the physical system using a hybrid automaton to enumerate potential hazardous states of the system. The cyber system is then analyzed using network and data flow models to develop the attack scenarios that may lead to the identified hazards. Finally, the attack scenarios are performed on the testbed and observations are obtained on the possible ways to prevent and mitigate the attacks. The insights gained from the experiments result in several key findings, including the expressive power of hybrid automaton in security risk assessment, the hazard development time and its impact on cyber security design, and the tight coupling between the physical and the cyber systems for CPS that requires an integrated design approach to achieve cost-effective and secure designs.

[36]  arXiv:2005.13749 (cross-list from cs.RO) [pdf]
Title: IoT-based Remote Control Study of a Robotic Trans-esophageal Ultrasound Probe via LAN and 5G
Comments: 9 pages, 5 figures, to be submitted to MICCAI ASMUS 2020 workshop
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

A robotic trans-esophageal echocardiography (TEE) probe has been recently developed to address the problems with manual control in the X-ray envi-ronment when a conventional probe is used for interventional procedure guidance. However, the robot was exclusively to be used in local areas and the effectiveness of remote control has not been scientifically tested. In this study, we implemented an Internet-of-things (IoT)-based configuration to the TEE robot so the system can set up a local area network (LAN) or be configured to connect to an internet cloud over 5G. To investigate the re-mote control, backlash hysteresis effects were measured and analysed. A joy-stick-based device and a button-based gamepad were then employed and compared with the manual control in a target reaching experiment for the two steering axes. The results indicated different hysteresis curves for the left-right and up-down steering axes with the input wheel's deadbands found to be 15 deg and deg, respectively. Similar magnitudes of positioning errors at approximately 0.5 deg and maximum overshoots at around 2.5 deg were found when manually and robotically controlling the TEE probe. The amount of time to finish the task indicated a better performance using the button-based gamepad over joystick-based device, although both were worse than the manual control. It is concluded that the IoT-based remote control of the TEE probe is feasible and a trained user can accurately manipulate the probe. The main identified problem was the backlash hysteresis in the steering axes, which can result in continuous oscillations and overshoots.

[37]  arXiv:2005.13799 (cross-list from cs.CV) [pdf, other]
Title: Explainable deep learning models in medical image analysis
Comments: Preprint submitted to J.Imaging, MDPI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deep learning methods have been very effective for a variety of medical diagnostic tasks and has even beaten human experts on some of those. However, the black-box nature of the algorithms has restricted clinical use. Recent explainability studies aim to show the features that influence the decision of a model the most. The majority of literature reviews of this area have focused on taxonomy, ethics, and the need for explanations. A review of the current applications of explainable deep learning for different medical imaging tasks is presented here. The various approaches, challenges for clinical deployment, and the areas requiring further research are discussed here from a practical standpoint of a deep learning researcher designing a system for the clinical end-users.

[38]  arXiv:2005.13807 (cross-list from math.OC) [pdf, other]
Title: Explicit Distributed and Localized Model Predictive Control via System Level Synthesis
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

An explicit Model Predictive Control algorithm for large-scale structured linear systems is presented. We base our results on Distributed and Localized Model Predictive Control (DLMPC), a closed-loop model predictive control scheme based on the System Level Synthesis (SLS) framework wherein only local state and model information needs to be exchanged between subsystems for the computation and implementation of control actions. We provide an explicit solution for each of the subproblems resulting from the distributed MPC scheme. We show that given the separability of the problem, the explicit solution is only divided into three regions per state and input instantiation, making the point location problem very efficient. Moreover, given the locality constraints, the subproblems are of much smaller dimension than the full problem, which significantly reduces the computational overhead of explicit solutions. We conclude with numerical simulations to demonstrate the computational advantages of our method, in which we show a large improvement in runtime per MPC iteration as compared with the results of computing the optimization with a solver online.

[39]  arXiv:2005.13813 (cross-list from cs.CR) [pdf, other]
Title: Detection of Lying Electrical Vehicles in Charging Coordination Application Using Deep Learning
Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)

The simultaneous charging of many electric vehicles (EVs) stresses the distribution system and may cause grid instability in severe cases. The best way to avoid this problem is by charging coordination. The idea is that the EVs should report data (such as state-of-charge (SoC) of the battery) to run a mechanism to prioritize the charging requests and select the EVs that should charge during this time slot and defer other requests to future time slots. However, EVs may lie and send false data to receive high charging priority illegally. In this paper, we first study this attack to evaluate the gains of the lying EVs and how their behavior impacts the honest EVs and the performance of charging coordination mechanism. Our evaluations indicate that lying EVs have a greater chance to get charged comparing to honest EVs and they degrade the performance of the charging coordination mechanism. Then, an anomaly based detector that is using deep neural networks (DNN) is devised to identify the lying EVs. To do that, we first create an honest dataset for charging coordination application using real driving traces and information revealed by EV manufacturers, and then we also propose a number of attacks to create malicious data. We trained and evaluated two models, which are the multi-layer perceptron (MLP) and the gated recurrent unit (GRU) using this dataset and the GRU detector gives better results. Our evaluations indicate that our detector can detect lying EVs with high accuracy and low false positive rate.

[40]  arXiv:2005.13827 (cross-list from cs.CL) [pdf, other]
Title: Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
Comments: INTERSPEECH 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for first-pass recognition as such. One way to solve this is to approximate the RNNLMs by back-off n-gram models. In this paper, we propose to interpolate the conventional n-gram models and the RNNLM approximation for better OOV recognition. Furthermore, we develop a new RNNLM approximation method suitable for subword units: It produces variable-order n-grams to include long-span approximations and considers also n-grams that were not originally observed in the training corpus. To evaluate these models on OOVs, we setup Arabic and Finnish Keyword Search tasks concentrating only on OOV words. On these tasks, interpolating the baseline RNNLM approximation and a conventional LM outperforms the conventional LM in terms of the Maximum Term Weighted Value for single-character subwords. Moreover, replacing the baseline approximation with the proposed method achieves the best performance on both multi- and single-character subwords.

[41]  arXiv:2005.13884 (cross-list from cs.CV) [pdf]
Title: CGGAN: A Context Guided Generative Adversarial Network For Single Image Dehazing
Comments: 12 pages, 7 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Image haze removal is highly desired for the application of computer vision. This paper proposes a novel Context Guided Generative Adversarial Network (CGGAN) for single image dehazing. Of which, an novel new encoder-decoder is employed as the generator. And it consists of a feature-extraction-net, a context-extractionnet, and a fusion-net in sequence. The feature extraction-net acts as a encoder, and is used for extracting haze features. The context-extraction net is a multi-scale parallel pyramid decoder, and is used for extracting the deep features of the encoder and generating coarse dehazing image. The fusion-net is a decoder, and is used for obtaining the final haze-free image. To obtain more better results, multi-scale information obtained during the decoding process of the context extraction decoder is used for guiding the fusion decoder. By introducing an extra coarse decoder to the original encoder-decoder, the CGGAN can make better use of the deep feature information extracted by the encoder. To ensure our CGGAN work effectively for different haze scenarios, different loss functions are employed for the two decoders. Experiments results show the advantage and the effectiveness of our proposed CGGAN, evidential improvements over existing state-of-the-art methods are obtained.

[42]  arXiv:2005.13896 (cross-list from cs.NI) [pdf, other]
Title: Simulation and Optimization of Content Delivery Networks considering User Profiles and Preferences of Internet Service Providers
Journal-ref: Winter Simulation Conference 2016
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

A Content Delivery Network (CDN) is a dynamic and complex service system. It causes a huge amount of traffic on the network infrastructure of Internet Service Providers (ISPs). Oftentimes, CDN providers and ISPs struggle to find an efficient and appropriate way to cooperate for mutual benefits. This challenge is key to push the quality of service (QoS) for the end-user. We model, simulate, and optimize the behavior of a CDN to provide cooperative solutions and to improve the QoS. Therefor, we determine reasonable server locations, balance the amount of servers and improve the user assignments to the servers. These aspects influence run time effects like caching at the server, response time and network load at specific links. Especially, user request history and profiles are considered to improve the overall performance. Since we consider multiple objectives, we aim to provide a diverse set of pareto optimal solutions using simulation based optimization.

[43]  arXiv:2005.13905 (cross-list from cs.NI) [pdf, other]
Title: Modeling the Location Selection of Mirror Servers in Content Delivery Networks
Comments: Conference on Services Computing 2016
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)

For a provider of a Content Delivery Network (CDN), the location selection of mirror servers is a complex optimization problem. Generally, the objective is to place the nodes centralized such that all customers have convenient access to the service according to their demands. It is an instance of the k-center problem, which is proven to be NP-hard. Determining reasonable server locations directly influences run time effects and future service costs. We model, simulate, and optimize the properties of a content delivery network. Specifically, considering the server locations in a network infrastructure with prioritized customers and weighted connections. A simulation model for the servers is necessary to analyze the caching behavior in accordance to the targeted customer requests. We analyze the problem and compare different optimization strategies. For our simulation, we employ various realistic scenarios and evaluate several performance indicators. Our new optimization approach shows a significant improvement. The presented results are generally applicable to other domains with k-center problems, e.g., the placement of military bases, the planning and placement of facility locations, or data mining.

[44]  arXiv:2005.13924 (cross-list from cs.CV) [pdf, other]
Title: CNN-based Approach for Cervical Cancer Classification in Whole-Slide Histopathology Images
Comments: Presented at the ICLR 2020 Workshop on AI for Overcoming Global Disparities in Cancer Care (AI4CC)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Cervical cancer will cause 460 000 deaths per year by 2040, approximately 90% are Sub-Saharan African women. A constantly increasing incidence in Africa making cervical cancer a priority by the World Health Organization (WHO) in terms of screening, diagnosis, and treatment. Conventionally, cancer diagnosis relies primarily on histopathological assessment, a deeply error-prone procedure requiring intelligent computer-aided systems as low-cost patient safety mechanisms but lack of labeled data in digital pathology limits their applicability. In this study, few cervical tissue digital slides from TCGA data portal were pre-processed to overcome whole-slide images obstacles and included in our proposed VGG16-CNN classification approach. Our results achieved an accuracy of 98,26% and an F1-score of 97,9%, which confirm the potential of transfer learning on this weakly-supervised task.

[45]  arXiv:2005.13945 (cross-list from math.OC) [pdf, ps, other]
Title: Event-triggered gain scheduling of reaction-diffusion PDEs
Comments: 20 pages, 5 figures, submitted to SICON
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper deals with the problem of boundary stabilization of 1D reaction-diffusion PDEs with a time- and space- varying reaction coefficient. The boundary control design relies on the backstepping approach. The gains of the boundary control are scheduled under two suitable event-triggered mechanisms. More precisely, gains are computed/updated on events according to two state-dependent event-triggering conditions: static-based and dynamic-based conditions, under which, the Zeno behavior is avoided and well-posedness as well as exponential stability of the closed-loop system are guaranteed. Numerical simulations are presented to illustrate the results.

[46]  arXiv:2005.13983 (cross-list from cs.CV) [pdf, other]
Title: Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shifts between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an effective approach of training it for both synthetic and realistic distortions. We first sample pairs of images from the same IQA databases and compute a probability that one image of each pair is of higher quality as the supervisory signal. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.

[47]  arXiv:2005.14042 (cross-list from math.NA) [pdf, other]
Title: Joint Reconstruction and Low-Rank Decomposition for Dynamic Inverse Problems
Subjects: Numerical Analysis (math.NA); Image and Video Processing (eess.IV); Optimization and Control (math.OC)

A primary interest in dynamic inverse problems is to identify the underlying temporal behaviour of the system from outside measurements. In this work we consider the case, where the target can be represented by a decomposition of spatial and temporal basis functions and hence can be efficiently represented by a low-rank decomposition. We then propose a joint reconstruction and low-rank decomposition method based on the Nonnegative Matrix Factorisation to obtain the unknown from highly undersampled dynamic measurement data. The proposed framework allows for flexible incorporation of separate regularisers for spatial and temporal features. For the special case of a stationary operator, we can effectively use the decomposition to reduce the computational complexity and obtain a substantial speed-up. The proposed methods are evaluated for two simulated phantoms and we compare obtained results to a separate low-rank reconstruction and subsequent decomposition approach based on the widely used principal component analysis.

[48]  arXiv:2005.14073 (cross-list from stat.ML) [pdf, other]
Title: Robust estimation via generalized quasi-gradients
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST); Computation (stat.CO)

We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly-used filtering algorithm.
For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an {approximate global minimum} if and only if the corruption level $\epsilon < 1/3$. Consequently, any optimization algorithm that aproaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With careful initialization and step size, we improve this to $1/2$, which is optimal.
For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $O(\sqrt{\epsilon})$ to the optimal rate of $O(\epsilon)$ for small $\epsilon$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $\tilde{O}(d/\epsilon^2)$.

[49]  arXiv:2005.14107 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised learning of multimodal image registration using domain adaptation with projected Earth Move's discrepancies
Comments: Medical Imaging with Deep Learning (accepted short paper) this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Multimodal image registration is a very challenging problem for deep learning approaches. Most current work focuses on either supervised learning that requires labelled training scans and may yield models that bias towards annotated structures or unsupervised approaches that are based on hand-crafted similarity metrics and may therefore not outperform their classical non-trained counterparts. We believe that unsupervised domain adaptation can be beneficial in overcoming the current limitations for multimodal registration, where good metrics are hard to define. Domain adaptation has so far been mainly limited to classification problems. We propose the first use of unsupervised domain adaptation for discrete multimodal registration. Based on a source domain for which quantised displacement labels are available as supervision, we transfer the output distribution of the network to better resemble the target domain (other modality) using classifier discrepancies. To improve upon the sliced Wasserstein metric for 2D histograms, we present a novel approximation that projects predictions into 1D and computes the L1 distance of their cumulative sums. Our proof-of-concept demonstrates the applicability of domain transfer from mono- to multimodal (multi-contrast) 2D registration of canine MRI scans and improves the registration accuracy from 33% (using sliced Wasserstein) to 44%.

Replacements for Fri, 29 May 20

[50]  arXiv:1904.04500 (replaced) [pdf, ps, other]
Title: Regional Robust Secure Precise Wireless Transmission Design for Multi-user Broadcasting System
Subjects: Signal Processing (eess.SP)
[51]  arXiv:1907.02644 (replaced) [pdf, other]
Title: PathologyGAN: Learning deep representations of cancer tissue
Comments: MIDL 2020 final version
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[52]  arXiv:1911.07731 (replaced) [pdf, other]
Title: Multi-modal Deep Guided Filtering for Comprehensible Medical Image Processing
Journal-ref: IEEE Transactions on Medical Imaging, vol. 39, no. 5, pp. 1703-1711, May 2020
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[53]  arXiv:1912.03151 (replaced) [src]
Title: NASNet: A Neuron Attention Stage-by-Stage Net for Single Image Deraining
Authors: Xu Qin, Zhilin Wang
Comments: underreviewed by conference
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[54]  arXiv:1912.04377 (replaced) [pdf, other]
Title: LSTM Neural Networks: Input to State Stability and Probabilistic Safety Verification
Comments: Accepted for Learning for dynamics & control (L4DC) 2020
Subjects: Systems and Control (eess.SY)
[55]  arXiv:2001.06595 (replaced) [pdf, other]
Title: On Optimal Multi-user Beam Alignment in Millimeter Wave Wireless Systems
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[56]  arXiv:2001.10487 (replaced) [pdf, other]
Title: Closed-loop frequency analyses of reset systems
Subjects: Systems and Control (eess.SY)
[57]  arXiv:2002.03595 (replaced) [pdf, other]
Title: Representation Learning on Variable Length and Incomplete Wearable-Sensory Time Series
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:2002.06751 (replaced) [pdf, other]
Title: Second-order Conic Programming Approach for Wasserstein Distributionally Robust Two-stage Linear Programs
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[59]  arXiv:2002.10505 (replaced) [pdf, other]
Title: Experiments with Tractable Feedback in Robotic Planning under Uncertainty: Insights over a wide range of noise regimes
Comments: arXiv admin note: substantial text overlap with arXiv:1909.08585, arXiv:2002.09478
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)
[60]  arXiv:2003.01866 (replaced) [pdf, other]
Title: Region adaptive graph fourier transform for 3d point clouds
Comments: 5 pages, 3 figures, accepted ICIP 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Signal Processing (eess.SP)
[61]  arXiv:2003.06971 (replaced) [pdf, ps, other]
Title: Online Algorithms for Dynamic Matching Markets in Power Distribution Systems
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA)
[62]  arXiv:2004.03315 (replaced) [pdf, other]
Title: Learning Control Barrier Functions from Expert Demonstrations
Comments: Updated link to codebase, corrected minor errata
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[63]  arXiv:2004.06569 (replaced) [pdf, other]
Title: Improving Calibration and Out-of-Distribution Detection in Medical Image Segmentation with Convolutional Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[64]  arXiv:2004.11676 (replaced) [pdf, other]
Title: Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[65]  arXiv:2005.02898 (replaced) [pdf, other]
Title: Optimal Tuning of a Class of Reset Controllers using Higher-Order Describing Function Analysis: Application in Precision Motion Systems
Subjects: Systems and Control (eess.SY)
[66]  arXiv:2005.06161 (replaced) [pdf, other]
Title: Online Scheduling of a Residential Microgrid via Monte-Carlo Tree Search and a Learned Model
Comments: 10 pages, 11 figures, submitted to TSG
Subjects: Systems and Control (eess.SY)
[67]  arXiv:2005.08040 (replaced) [pdf, other]
Title: Artificial neural networks for 3D cell shape recognition from confocal images
Comments: 17 pages, 8 figures
Subjects: Quantitative Methods (q-bio.QM); Image and Video Processing (eess.IV)
[68]  arXiv:2005.10407 (replaced) [pdf, other]
Title: Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69]  arXiv:2005.12209 (replaced) [pdf, other]
Title: JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[70]  arXiv:2005.12639 (replaced) [pdf, other]
Title: Bayesian Generative Models for Knowledge Transfer in MRI Semantic Segmentation Problems
Comments: arXiv admin note: substantial text overlap with arXiv:1908.05480
Subjects: Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[71]  arXiv:2005.12855 (replaced) [pdf, other]
Title: Towards computer-aided severity assessment: training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity
Comments: 7 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[72]  arXiv:2005.13061 (replaced) [pdf, other]
Title: Prediction of Thrombectomy Functional Outcomes using Multimodal Data
Comments: Accepted at Medical Image Understanding and Analysis (MIUA) 2020
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[73]  arXiv:2005.13071 (replaced) [pdf, other]
Title: Spatiotemporal motion prediction in free-breathing liver scans via a recurrent multi-scale encoder decoder
Subjects: Image and Video Processing (eess.IV)
[74]  arXiv:2005.13400 (replaced) [pdf]
Title: Pan-artifact Removing with Deep Learning, on ISEs
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
[75]  arXiv:2005.13531 (replaced) [pdf, other]
Title: How to do Physics-based Learning
Comments: 3 pages, 2 figures, linked repository this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2005, contact, help  (Access key information)