We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 123 entries: 1-123 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 12 Nov 19

[1]  arXiv:1911.03452 [pdf, other]
Title: Safety-Critical Control Synthesis for network systems with Control Barrier Functions and Assume-Guarantee Contracts
Comments: arXiv admin note: text overlap with arXiv:1810.10636
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper presents a contract based framework for safety-critical control synthesis for network systems. To handle the large state dimension of such systems, an assume-guarantee contract is used to break the large synthesis problem into smaller subproblems. Parameterized signal temporal logic (pSTL) is used to formally describe the behaviors of the subsystems, which we use as the template for the contract. We show that robust control invariant sets (RCIs) for the subsystems can be composed to form a robust control invariant set for the whole network system under a valid assume-guarantee contract. An epigraph algorithm is proposed to solve for a contract that is valid, ---an approach that has linear complexity for a sparse network, which leads to a robust control invariant set for the whole network. Implemented with control barrier function (CBF), the state of each subsystem is guaranteed to stay within the safe set. Furthermore, we propose a contingency tube Model Predictive Control (MPC) approach based on the robust control invariant set, which is capable of handling severe contingencies, including topology changes of the network. A power grid example is used to demonstrate the proposed method. The simulation result includes both set point control and contingency recovery, and the safety constraint is always satisfied.

[2]  arXiv:1911.03461 [pdf, other]
Title: AIM 2019 Challenge on Image Demoireing: Methods and Results
Comments: arXiv admin note: text overlap with arXiv:1911.02498
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire was created for this challenge, and consists of 10,200 synthetically generated image pairs (moire and clean ground truth). The challenge was divided into 2 tracks. Track 1 targeted fidelity, measuring the ability of demoire methods to obtain a moire-free image compared with the ground truth, while Track 2 examined the perceptual quality of demoire methods. The tracks had 60 and 39 registered participants, respectively. A total of eight teams competed in the final testing phase. The entries span the current the state-of-the-art in the image demoireing problem.

[3]  arXiv:1911.03464 [pdf, other]
Title: Image Super-Resolution via Residual Blended Attention Generative Adversarial Network with Dual Discriminators
Comments: Submitted to Neurocomputing, current status: under review. arXiv admin note: substantial text overlap with arXiv:1906.06575, arXiv:1905.05084
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This paper develops an image super-resolution algorithm based on residual blended attention generative adversarial network with dual discriminators. In the generator part, on the basis of residual neural network, the proposed algorithm adds blended attention blocks to make the neural network concentrate more on specific channels and regions with abundant high-frequency details to increase feature expression capabilities. The feature maps are subsampled using sub-pixel convolutional layers to obtain final high-resolution images. The discriminator part consists of two discriminators that work in pixel domain and feature domain respectively. Both discriminators are designed as Wasserstein GAN structures to improve training instability and to overcome model collapse scenario. The dual discriminators and generator are trained alternately and direct the generator to generate images with abundant high-frequency details through combat learning. The loss of generator and dual discriminators to the generator are fused to constrain generator's training, further improve the accuracy. Experimental results show that the proposed algorithm is significant better on objective evaluation indicators such as Peak Signal-to-Noise Ratio(PSNR) and Structural Similarity(SSIM) on several public benchmarks such as Set5 and Set14, compared with mainstream CNN-based algorithms and the obtained images are closet to real images with real sharp details, which fully proves the effectiveness and superiority of our proposed algorithm.

[4]  arXiv:1911.03471 [pdf, other]
Title: Arm Motion Classification Using Curve Matching of Maximum Instantaneous Doppler Frequency Signatures
Comments: 6 pages, 7 figures, 2020 IEEE radar conference. arXiv admin note: substantial text overlap with arXiv:1910.11176
Subjects: Signal Processing (eess.SP)

Hand and arm gesture recognition using the radio frequency (RF) sensing modality proves valuable in manmachine interface and smart environment. In this paper, we use curve matching techniques for measuring the similarity of the maximum instantaneous Doppler frequencies corresponding to different arm gestures. In particular, we apply both Frechet and dynamic time warping (DTW) distances that, unlike the Euclidean (L2) and Manhattan (L1) distances, take into account both the location and the order of the points for rendering two curves similar or dissimilar. It is shown that improved arm gesture classification can be achieved by using the DTW method, in lieu of L2 and L1 distances, under the nearest neighbor (NN) classifier.

[5]  arXiv:1911.03512 [pdf, other]
Title: Radar Human Motion Recognition Using Motion States and Two-Way Classifications
Subjects: Signal Processing (eess.SP)

We perform classification of activities of daily living (ADL) using a Frequency-Modulated Continuous Waveform (FMCW) radar. In particular, we consider contiguous motions that are inseparable in time. Both the micro-Doppler signature and range-map are used to determine transitions from translation (walking) to in-place motions and vice versa, as well as to provide motion onset and the offset times. The possible classes of activities post and prior to the translation motion can be separately handled by forward and background classifiers. The paper describes ADL in terms of states and transitioning actions, and sets a framework to deal with separable and inseparable contiguous motions. It is shown that considering only the physically possible classes of motions stemming from the current motion state improves classification rates compared to incorporating all ADL for any given time.

[6]  arXiv:1911.03534 [pdf, other]
Title: Optimal Torque Control of Permanent Magnet Synchronous Motors Using Adaptive Dynamic Programming
Comments: 9 Pages, 12 Figures, 4 Tables
Subjects: Systems and Control (eess.SY)

In this study, a new approach based on adaptive dynamic programming (ADP) is proposed to control permanent magnet synchronous motors (PMSM). The control algorithm uses two neural networks, called critic and actor. The former is utilized to evaluate the cost and the latter is used to generate control signals. The training is done once offline and the calculated optimal weights of actor network are used in online control to achieve fast and accurate torque control of PMSMs. This algorithm is compared with field oriented control (FOC) and direct torque control based on space vector modulation (DTC-SVM). Simulations and experimental results show that the proposed algorithm provides desirable results under both accurate and uncertain modeled dynamics.

[7]  arXiv:1911.03558 [pdf, other]
Title: Joint Demosaicing and Super-Resolution (JDSR): Network Design and Perceptual Optimization
Authors: Xuan Xu, Yanfang (Fanny)Ye, Xin Li
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Image demosaicing and super-resolution are two important tasks in color imaging pipeline. So far they have been mostly independently studied in the open literature of deep learning; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem. In this paper, we propose an end-to-end optimization solution to the JDSR problem and demonstrate its practical significance in computational imaging. Our technical contributions are mainly two-fold. On network design, we have developed a Densely-connected Squeeze-and-Excitation Residual Network (DSERN) for JDSR. For the first time, we address the issue of spatio-spectral attention for color images and discuss how to achieve better information flow by smooth activation for JDSR. Experimental results have shown moderate PSNR/SSIM gain can be achieved by DSERN over previous naive network architectures. On perceptual optimization, we propose to leverage the latest ideas including relativistic discriminator and pre-excitation perceptual loss function to further improve the visual quality of reconstructed images. Our extensive experiment results have shown that Texture-enhanced Relativistic average Generative Adversarial Network (TRaGAN) can produce both subjectively more pleasant images and objectively lower perceptual distortion scores than standard GAN for JDSR. We have verified the benefit of JDSR to high-quality image reconstruction from real-world Bayer pattern collected by NASA Mars Curiosity.

[8]  arXiv:1911.03624 [pdf, other]
Title: Natural and Realistic Single Image Super-Resolution with Explicit Natural Manifold Discrimination
Comments: Presented in CVPR 2019
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Recently, many convolutional neural networks for single image super-resolution (SISR) have been proposed, which focus on reconstructing the high-resolution images in terms of objective distortion measures. However, the networks trained with objective loss functions generally fail to reconstruct the realistic fine textures and details that are essential for better perceptual quality. Recovering the realistic details remains a challenging problem, and only a few works have been proposed which aim at increasing the perceptual quality by generating enhanced textures. However, the generated fake details often make undesirable artifacts and the overall image looks somewhat unnatural. Therefore, in this paper, we present a new approach to reconstructing realistic super-resolved images with high perceptual quality, while maintaining the naturalness of the result. In particular, we focus on the domain prior properties of SISR problem. Specifically, we define the naturalness prior in the low-level domain and constrain the output image in the natural manifold, which eventually generates more natural and realistic images. Our results show better naturalness compared to the recent super-resolution algorithms including perception-oriented ones.

[9]  arXiv:1911.03684 [pdf, other]
Title: An Algorithmic View on Optimal Storage Sizing
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Users can arbitrage against Time-of-Use (ToU) pricing with storage by charging in off-peak period and discharge in peak periods. In this paper we design the optimal control policy and the solve optimal investment for general ToU scheme. We formulate the problem as dynamic programming for efficient solution. Our result is feasible facing multi-peaked ToU scheme. Simulation studies examine how the user's cost varies with respect to the user's demand randomness; we also demonstrate the performance of our scheme when aggregating users for extra savings.

[10]  arXiv:1911.03699 [pdf]
Title: Multi-Bernoulli Mixture Filter: Complete Derivation and Sequential Monte Carlo Implementation
Authors: Sen Wang
Comments: 5 pages, 1 figure, journal
Subjects: Signal Processing (eess.SP)

Multi-Bernoulli mixture (MBM) filter is one of the exact closed-form multi-target Bayes filters in the random finite sets (RFS) framework, which utilizes multi-Bernoulli mixture density as the multi-target conjugate prior. This filter is the variant of Poisson multi-Bernoulli mixture filter when the birth process is changed to a multi-Bernoulli RFS or a multi-Bernoulli mixture RFS from a Poisson RFS. On the other hand, labeled multi-Bernoulli mixture filter evolves to MBM filter when the label is discarded. In this letter, we provide a complete derivation of MBM filter where the derivation of update step does not use the probability generating functional. We also describe the sequential Monte Carlo implementation and adopt Gibbs sampling for truncating the MBM filtering density. Numerical simulation with a nonlinear measurement model shows that MBM filter outperforms the classical probability hypothesis density filter.

[11]  arXiv:1911.03711 [pdf, other]
Title: Unsupervised adulterated red-chili pepper content transformation for hyperspectral classification
Comments: 10 pages,
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Preserving red-chili quality is of utmost importance in which the authorities demand the quality techniques to detect, classify and prevent it from the impurities. For example, salt, wheat flour, wheat bran, and rice bran contamination in grounded red chili, which typically a food, are a serious threat to people who are allergic to such items. This work presents the feasibility of utilizing visible and near-infrared (VNIR) hyperspectral imaging (HSI) to detect and classify the aforementioned adulterants in red chili. However, adulterated red chili data annotation is a big challenge for classification because the acquisition of labeled data for real-time supervised learning is expensive in terms of cost and time. Therefore, this study, for the very first time proposes a novel approach to annotate the red chili samples using a clustering mechanism at 500~nm wavelength spectral response due to its dark appearance at a specified wavelength. Later the spectral samples are classified into pure or adulterated using one-class SVM. The classification performance achieves 99% in case of pure adulterants or red chili whereas 85% for adulterated samples. We further investigate that the single classification model is enough to detect any foreign substance in red chili pepper rather than cascading multiple PLS regression models.

[12]  arXiv:1911.03723 [pdf, other]
Title: Deep learning for cardiac image segmentation: A review
Comments: Under review
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.

[13]  arXiv:1911.03728 [pdf, other]
Title: An Approximate Dynamic Programming Approach for Dual Stochastic Model Predictive Control
Subjects: Systems and Control (eess.SY)

Dual control explicitly addresses the problem of trading off active exploration and exploitation in the optimal control of partially unknown systems. While the problem can be cast in the framework of stochastic dynamic programming, exact solutions are only tractable for discrete state and action spaces of very small dimension due to a series of nested minimization and expectation operations. We propose an approximate dual control method for systems with continuous state and input domain based on a rollout dynamic programming approach, splitting the control horizon into a dual and an exploitation part. The dual part is approximated using a scenario tree generated by sampling the process noise and the unknown system parameters, for which the underlying distribution is updated via Bayesian estimation along the horizon. In the exploitation part, we fix the resulting parameter estimate of each scenario branch and compute an open-loop control sequence for the remainder of the horizon. The key benefit of the proposed sampling-based approximation is that it enables the formulation as one optimization problem that computes a collection of control sequences over the scenario tree, leading to a dual model predictive control formulation.

[14]  arXiv:1911.03737 [pdf, other]
Title: Physics-Informed Neural Networks for Power Systems
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper introduces for the first time, to our knowledge, a framework for physics-informed neural networks in power system applications. Exploiting the underlying physical laws governing power systems, and inspired by recent developments in the field of machine learning, this paper proposes a neural network training procedure that can make use of the wide range of mathematical models describing power system behavior, both in steady-state and in dynamics. Physics-informed neural networks require substantially less training data and result in much simpler neural network structures, while achieving exceptional accuracy. This work unlocks a wide range of opportunities in power systems, being able to determine dynamic states, such as rotor angles and frequency, and uncertain parameters such as inertia, damping, and system topology at unprecedented speed. This paper focuses on introducing the framework and showcases its potential using a single-machine infinite bus system as a guiding example. Physics-informed neural networks are shown to accurately determine rotor angle and frequency up to \emph{87 times faster} than conventional methods.

[15]  arXiv:1911.03740 [pdf, other]
Title: On the design of convolutional neural networks for automatic detection of Alzheimer's disease
Comments: Accepted to NeuraIPS 2019 ML4H workshop
Journal-ref: Proceedings of Machine Learning Research, 2019
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Early detection is a crucial goal in the study of Alzheimer's Disease (AD). In this work, we describe several techniques to boost the performance of 3D convolutional neural networks trained to detect AD using structural brain MRI scans. Specifically, we provide evidence that (1) instance normalization outperforms batch normalization, (2) early spatial downsampling negatively affects performance, (3) widening the model brings consistent gains while increasing the depth does not, and (4) incorporating age information yields moderate improvement. Together, these insights yield an increment of approximately 14% in test accuracy over existing models when distinguishing between patients with AD, mild cognitive impairment, and controls in the ADNI dataset. Similar performance is achieved on an independent dataset.

[16]  arXiv:1911.03747 [pdf, other]
Title: OFDM With Hybrid Number and Index Modulation
Subjects: Signal Processing (eess.SP)

A novel transmission scheme is introduced for efficient data transmission by conveying additional information bits through jointly changing the index and number of active subcarriers within each orthogonal frequency division multiplexing (OFDM) subblock. The proposed scheme is different from the conventional OFDM-subcarrier number modulation (OFDM-SNM) and OFDM-index modulation (OFDM-IM), in which data bits are transmitted using either number or index of active subcarriers. The proposed modulation technique offers superior spectral and energy efficiency compared to its counterparts OFDM-SNM and OFDM-IM, especially at low modulation orders such as binary phase shift keying (BPSK) that can provide high reliability and low complexity, making it suitable for Internet of Things (IoT) applications that require better spectral and energy efficiency while enjoying high reliability and low complexity. Bit error rate (BER) performance analysis is provided for the proposed scheme, and Monte Carlo simulations are presented to prove the consistency of simulated BER with the analyzed one.

[17]  arXiv:1911.03749 [pdf, other]
Title: A Characterization of All Passivizing Input-Output Transformations of a Passive-Short System
Comments: 7 pages, 1 figure
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Passivity theory is one of the cornerstones of control theory, as it allows one to prove stability of a large-scale system while treating each component separately. In practice, many systems are not passive, and must be passivized in order to be included in the framework of passivity theory. Input-output transformations are the most general tool for passivizing systems, generalizing output-feedback and input-feedthrough. In this paper, we classify all possible input-output transformations that map a system with given shortage of passivity to a system with prescribed excess of passivity. We do so by using the connection between passivity theory and cones for SISO systems, and using the S-lemma for MIMO systems.

[18]  arXiv:1911.03750 [pdf]
Title: Speech Dereverberation and Noise Reduction for both diffusive noise field and point noise source in Binaural Hearing Aids: Preliminary Version
Comments: This is a preliminary version. The final work will be available in its full version soon
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

The multichannel Wiener filter (MWF) and its variations have been extensively applied to binaural hearing aids. However, its major drawback is the distortion of the binaural cues of the residual noise, changing the original acoustic scenario, which is of paramount importance for hearing impaired people. The MWF-IC method was previously proposed for joint speech dereverberation and noise reduction, preserving the interaural coherence (IC) of diffuse noise fields. In this work, we propose a new variation of the MWF-IC for both speech dereverberation and noise reduction, which preserves the original spatial characteristics of the residual noise for either diffuse fields or point sources. Objective measures and preliminary psychoacoustic experiments indicate the proposed method is capable of perceptually preserving the original spatialization of both types of noise, without significant performance loss in both speech dereverberation and noise reduction.

[19]  arXiv:1911.03759 [pdf, other]
Title: DeVLearn: A Deep Visual Learning Framework for Localizing Temporary Faults in Power Systems
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Frequently recurring transient faults in a transmission network may be indicative of impending permanent failures. Hence, determining their location is a critical task. This paper proposes a novel image embedding aided deep learning framework called DeVLearn for faulted line location using PMU measurements at generator buses. Inspired by breakthroughs in computer vision, DeVLearn represents measurements (one-dimensional time series data) as two-dimensional unthresholded Recurrent Plot (RP) images. These RP images preserve the temporal relationships present in the original time series and are used to train a deep Variational Auto-Encoder (VAE). The VAE learns the distribution of latent features in the images. Our results show that for faults on two different lines in the IEEE 68-bus network, DeVLearn is able to project PMU measurements into a two-dimensional space such that data for faults at different locations separate into well-defined clusters. This compressed representation may then be used with off-the-shelf classifiers for determining fault location. The efficacy of the proposed framework is demonstrated using local voltage magnitude measurements at two generator buses.

[20]  arXiv:1911.03765 [pdf]
Title: Optimal Technical and Economical Operation of Microgrids through the Implementation of Sequential Quadratic Programming Algorithm
Subjects: Systems and Control (eess.SY)

In this paper, the optimal operation of a microgrid is investigated when the network is connected at a 24-hour interval for a specific day. For the optimal operation, some criteria, such as the simultaneous reduction in operation costs, losses, and voltage deviations, as well as the increase in reliability are taken into consideration. Given the capacity of the installed units, a part of the load of the demanded energy is always supplied by the grid. In line with the objectives set, planning for the charge and discharge of storage systems as well as the interaction among distributed power generators are investigated. The operation process is considered as an optimization problem, and the created problem is solved by the Sequential Quadratic Programming (SQP) algorithm. In addition, by implementing the Demand Response (DR) program in the optimal operation, the acquired results are compared with those obtained when this strategy is not employed. Simulation results indicate that the utilization of the Demand Response (DR) program in the optimal operation of the microgrid leads to a number of improvements. To elucidate it more, in addition to the decrease in the operation costs of the microgrid, the voltage deviation indices, reliability, and losses are improved contrary to non-DR schemes.

[21]  arXiv:1911.03780 [pdf]
Title: Repurposing an Energy System Optimization Model for Seasonal Power Generation Planning
Comments: 22 pages, 5 figures
Journal-ref: Energy, 181: 1321-1330 (2019)
Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)

Seasonal climate variations affect electricity demand, which in turn affects month-to-month electricity planning and operations. Electricity system planning at the monthly timescale can be improved by adapting climate forecasts to estimate electricity demand and utilizing energy models to estimate monthly electricity generation and associated operational costs. The objective of this paper is to develop and test a computationally efficient model that can support seasonal planning while preserving key aspects of system operation over hourly and daily timeframes. To do so, an energy system optimization model is repurposed for seasonal planning using features drawn from a unit commitment model. Different scenarios utilizing a well-known test system are used to evaluate the errors associated with both the repurposed energy system model and an imperfect load forecast. The results show that the energy system optimization model using an imperfect load forecast produces differences in monthly cost and generation levels that are less than 2% compared with a unit commitment model using a perfect load forecast. The enhanced energy system optimization model can be solved approximately 100 times faster than the unit commitment model, making it a suitable tool for future work aimed at evaluating seasonal electricity generation and demand under uncertainty.

[22]  arXiv:1911.03786 [pdf, other]
Title: Spatially Regularized Parametric Map Reconstruction for Fast Magnetic Resonance Fingerprinting
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Magnetic resonance fingerprinting (MRF) provides a unique concept for simultaneous and fast acquisition of multiple quantitative MR parameters. Despite acquisition efficiency, adoption of MRF into the clinics is hindered by its dictionary-based reconstruction, which is computationally demanding and lacks scalability. Here, we propose a convolutional neural network-based reconstruction, which enables both accurate and fast reconstruction of parametric maps, and is adaptable based on the needs of spatial regularization and the capacity for the reconstruction. We evaluated the method using MRF T1-FF, an MRF sequence for T1 relaxation time of water and fat fraction mapping. We demonstrate the method's performance on a highly heterogeneous dataset consisting of 164 patients with various neuromuscular diseases imaged at thighs and legs. We empirically show the benefit of incorporating spatial regularization during the reconstruction and demonstrate that the method learns meaningful features from MR physics perspective. Further, we investigate the ability of the method to handle highly heterogeneous morphometric variations and its generalization to anatomical regions unseen during training. The obtained results outperform the state-of-the-art in deep learning-based MRF reconstruction. Coupled with fast MRF sequences, the proposed method has the potential of enabling multiparametric MR imaging in clinically feasible time.

[23]  arXiv:1911.03843 [pdf, other]
Title: Characterizing dynamically varying acoustic scenes from egocentric audio recordings in workplace setting
Comments: The paper is submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Devices capable of detecting and categorizing acoustic scenes have numerous applications such as providing context-aware user experiences. In this paper, we address the task of characterizing acoustic scenes in a workplace setting from audio recordings collected with wearable microphones. The acoustic scenes, tracked with Bluetooth transceivers, vary dynamically with time from the egocentric perspective of a mobile user. Our dataset contains experience sampled long audio recordings collected from clinical providers in a hospital, who wore the audio badges during multiple work shifts. To handle the long egocentric recordings, we propose a Time Delay Neural Network~(TDNN)-based segment-level modeling. The experiments show that TDNN outperforms other models in the acoustic scene classification task. We investigate the effect of primary speaker's speech in determining acoustic scenes from audio badges, and provide a comparison between performance of different models. Moreover, we explore the relationship between the sequence of acoustic scenes experienced by the users and the nature of their jobs, and find that the scene sequence predicted by our model tend to possess similar relationship. The initial promising results reveal numerous research directions for acoustic scene classification via wearable devices as well as egocentric analysis of dynamic acoustic scenes encountered by the users.

[24]  arXiv:1911.03847 [pdf]
Title: Power System Problems in Teaching Control Theory on Simulink
Authors: Maddumage Karunaratne, Christopher Gabany (Department of Electrical Engineering, University of Pittsburgh, Johnstown, PA, USA)
Subjects: Systems and Control (eess.SY)

This experiment demonstrates to engineering students that control system and power system theory are not orthogonal, but highly interrelated. It introduces a real-world power system problem to enhance time domain State Space Modelling (SSM) skills of students. It also shows how power quality is affected with real-world scenarios. Power system was modeled in State Space by following its circuit topology in a bottom-up fashion. At two different time instances of the power generator sinusoidal wave, the transmission line was switched on. Fourier transform was used to analyze resulting line currents. It validated the harmonic components, as expected, from power system theory. Students understood the effects of switching transients at various times on supply voltage sinusoid within control theory and learned time domain analysis. They were surveyed to gauge their perception of the project. Results from a before/after assessment analyzed using T-Tests showed a statistically significant enhanced learning in SSM.

[25]  arXiv:1911.03870 [pdf, other]
Title: Synthesis of Feedback Controller for Nonlinear Control Systems with Optimal Region of Attraction
Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Optimization and Control (math.OC)

The problem of computing and characterizing Region of Attraction (ROA) with its many variations have a long tradition in safety-critical systems and control theory. By virtue here comes the connections to Lyapunov functions that are considered as the centerpiece of stability theory for a non linear dynamical systems. The agents may be imperfect because of several limitations in the sensors which ultimately restrict to fully observe the potential adversaries in the environment. Therefore while interacting with human life an autonomous robot should safely explore the outdoor environment by avoiding the dangerous states that may cause physical harm both the systems and environment. In this paper we address this problem and propose a framework of learning policies that adapt to the shape of largest safe region in the state space. At the inception the model is trained to learn an accurate safety certificate for non-linear closed loop dynamics system by constructing Lyapunov Neural Network. The current work is also an extension of the previous work of computing ROA under a fixed policy. Specifically we discuss how to design a state feedback controller by using a typical kind of performance objective function to be optimized and demonstrates our method on a simulated inverted pendulum which clearly shows that how this model can be used to resolve issues of trade-offs and extra design freedom.

[26]  arXiv:1911.03884 [pdf, other]
Title: Learning Koopman Operator under Dissipativity Constraints
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper addresses a learning problem for nonlinear dynamical systems with incorporating any specified dissipativity property. The nonlinear systems are described by the Koopman operator, which is a linear operator defined on the infinite-dimensional lifted state space. The problem of learning the Koopman operator under specified quadratic dissipativity constraints is formulated and addressed. The learning problem is in a class of the non-convex optimization problem due to nonlinear constraints and is numerically intractable. By applying the change of variable technique and the convex overbounding approximation, the problem is reduced to sequential convex optimization and is solved in a numerically efficient manner. Finally, a numerical simulation is given, where high modeling accuracy achieved by the proposed approach including the specified dissipativity is demonstrated.

[27]  arXiv:1911.03886 [pdf, other]
Title: Machine Learning Based Channel Estimation: A Computational Approach for Universal Channel Conditions
Comments: 10 pages, 11 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recently, machine learning has been introduced in communications to deal with channel estimation. Under non-linear system models, the superiority of machine learning based estimation has been demonstrated by simulation expriments, but the theoretical analysis is not sufficient, since the performance of machine learning, especially deep learning, is hard to analyze. This paper focuses on some theoretical problems in machine learning based channel estimation. As a data-driven method, certain amount of training data is the prerequisite of a workable machine learning based estimation, and it is analyzed qualitively in a statistic view in this paper. To deduce the exact sample size, we build a statistic model ignoring the exact structure of the learning module and then the relationship between sample size and learning performance is derived. To testify our analysis, we employ machine learning based channel estimation in OFDM system and apply two typical neural networks as the learning module: single layer or linear structure and three layer structure. The simulation results show that the analysis sample size is correct when input dimension and complexity of learning module are low, but the true required sample size will be larger the analysis result otherwise, since the influence of the two factors is not considered in the analysis of sample size. Also, we simulate the performance of machine learning based channel estimation under quasi-stationary channel condition, where the explicit form of MMSE estimation is hard to obtain, and the simulation results exhibit the effectiveness and convenience of machine learning based channel estimation under complex channel models.

[28]  arXiv:1911.03887 [pdf, other]
Title: Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-assisted Mobile Edge Computing
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)

In this paper, we consider a platform of flying mobile edge computing (F-MEC), where unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and they enable task offloading from user equipment (UE). We aim to minimize energy consumption of all the UEs via optimizing the user association, resource allocation and the trajectory of UAVs. To this end, we first propose a Convex optimizAtion based Trajectory control algorithm (CAT), which solves the problem in an iterative way by using block coordinate descent (BCD) method. Then, to make the real-time decision while taking into account the dynamics of the environment (i.e., UAV may take off from different locations), we propose a deep Reinforcement leArning based Trajectory control algorithm (RAT). In RAT, we apply the Prioritized Experience Replay (PER) to improve the convergence of the training procedure. Different from the convex optimization based algorithm which may be susceptible to the initial points and requires iterations, RAT can be adapted to any taking off points of the UAVs and can obtain the solution more rapidly than CAT once training process has been completed. Simulation results show that the proposed CAT and RAT achieve the similar performance and both outperform traditional algorithms.

[29]  arXiv:1911.03929 [pdf, other]
Title: Positioning of Multiple Unmanned Aerial Vehicle Base Stations in future Wireless Network
Subjects: Signal Processing (eess.SP); Optimization and Control (math.OC)

Unmanned aerial vehicle (UAV) base stations (BSs) are reliable and efficient alternative to full fill the coverage and capacity requirements when the backbone network fails to provide such requirements due to disasters. In this paper, we consider optimal UAV-deployment problem in 3D space for a mmWave network. The objective is to deploy multiple aerial BSs simultaneously to completely serve the ground users. We develop a novel algorithm to find the feasible positions for a set of UAV-BSs from a predefined set of locations, subject to a signal-to-interference-plus-noise ratio (SINR) constraint of every associated user, UAV-BS's limited hovering altitude constraint and restricted operating zone constraint. We cast this 3D positioning problem as an l_0 minimization problem. This is a combinatorial, NP-hard problem. We approximate the l_0 minimization problem as non-combinatorial l_1-norm problem. Therefore, we provide a suboptimal algorithm to find a set of feasible locations for the UAV-BSs to operate. The analysis shows that the proposed algorithm achieves a set of location to deploy multiple UVA-BSs simultaneously while satisfying the constraints.

[30]  arXiv:1911.03930 [pdf, other]
Title: Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement. When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data. However, audio-visual VAE is not robust against noisy visual data, e.g., when for some video frames, speaker face is not frontal or lips region is occluded. In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. This mixture model consists of a trained audio-only VAE and a trained audio-visual VAE. The motivation is to skip noisy visual frames by switching to the audio-only VAE model. We present a variational expectation-maximization method to estimate the parameters of the model. Experiments show the promising performance of the proposed method.

[31]  arXiv:1911.03955 [pdf]
Title: Distributed Recursive Filtering for Spatially Interconnected Systems with Randomly Occurred Missing Measurements
Authors: Bai Li
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

This paper proposed a distributed filter for spatially interconnected systems (SISs), which considers missing measurements in the sensors of sub-systems. An SIS is established by many similar sub-systems that directly interact or communicate with connective neighbors. Despite that the interactions are simple and tractable, the overall SIS can perform rich and complex behaviors. In actual projects, sensors of sub-systems in a sensor network may break down sometimes, which causes parts of the measurements unavailable unexpectedly. In this work, distributed characteristics of SISs are described by Andrea model and the losses of measurements are assumed to occur with known probabilities. Experimental results confirm that, this filtering method can be effectively employed for the state estimation of SISs, when missing measurements occur.

[32]  arXiv:1911.03970 [pdf, other]
Title: Improved Large-margin Softmax Loss for Speaker Diarisation
Comments: Submitted to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker error rate (SER). By using all hyper parameters of the loss in a unified way, further improvements were achieved which reached a relative SER reduction of 24.6% over the baseline. However, by training overlapping and single speaker speech samples with different margins, the best result was achieved, giving overall a 29.5% SER reduction relative to the baseline.

[33]  arXiv:1911.03972 [pdf]
Title: IrisNet: Deep Learning for Automatic and Real-time Tongue Contour Tracking in Ultrasound Video Data using Peripheral Vision
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

The progress of deep convolutional neural networks has been successfully exploited in various real-time computer vision tasks such as image classification and segmentation. Owing to the development of computational units, availability of digital datasets, and improved performance of deep learning models, fully automatic and accurate tracking of tongue contours in real-time ultrasound data became practical only in recent years. Recent studies have shown that the performance of deep learning techniques is significant in the tracking of ultrasound tongue contours in real-time applications such as pronunciation training using multimodal ultrasound-enhanced approaches. Due to the high correlation between ultrasound tongue datasets, it is feasible to have a general model that accomplishes automatic tongue tracking for almost all datasets. In this paper, we proposed a deep learning model comprises of a convolutional module mimicking the peripheral vision ability of the human eye to handle real-time, accurate, and fully automatic tongue contour tracking tasks, applicable for almost all primary ultrasound tongue datasets. Qualitative and quantitative assessment of IrisNet on different ultrasound tongue datasets and PASCAL VOC2012 revealed its outstanding generalization achievement in compare with similar techniques.

[34]  arXiv:1911.03975 [pdf, other]
Title: Graph Neural Net using Analytical Graph Filters and Topology Optimization for Image Denoising
Comments: Image denoising, deep learning, analytical graph filter
Subjects: Image and Video Processing (eess.IV)

While convolutional neural nets (CNN) have achieved remarkable performance for a wide range of inverse imaging applications, the filter coefficients are computed in a purely data-driven manner and are not explainable. Inspired by an analytically derived CNN byHadji et al., in this paper we construct a new layered graph convolutional neural net (GCNN) using GraphBio as our graph filter. Unlike convolutional filters in previous GNNs, our employed GraphBio is analytically defined and requires no training, and we optimize the end-to-end system only via learning of appropriate graph topology at each layer. In signal filtering terms, it means that our linear graph filter at each layer is always intrepretable as low-pass with known biorthogonal conditions, while the graph spectrum itself is optimized via data training. As an example application, we show that our analytical GCNN achieves image denoising performance comparable to a state-of-the-art CNN-based scheme when the training and testing data share the same statistics, and when they differ, our analyticalGCNN outperforms it by more than 1dB in PSNR.

[35]  arXiv:1911.03988 [pdf, ps, other]
Title: Model-Free Learning of Optimal Ergodic Policies in Wireless Systems
Comments: 13 pages, 4 figures
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)

Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This paper constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.

[36]  arXiv:1911.03994 [pdf, ps, other]
Title: Real Signal Equalization for OQAM
Comments: Accepted in IEEE Signal Processing Letters. For Educational Purposes Only
Subjects: Signal Processing (eess.SP)

This correspondence proposes the use of a real-only equalizer (ROE), which acts on real signals derived from the received offset quadrature amplitude modulation (OQAM) symbols. For the same fading channel, we prove that both ROE and the widely linear equalizer (WLE) yield equivalent outputs. Hence, these exhibit the same performance. Our complexity analysis finds that depending on the frame length, ROE can be computationally less complex, and save significant signal processing time over WLE. In the adaptive normalized least mean square implementation, ROE performs better with lower complexity than its counterpart, for a given number of pilot bits.

[37]  arXiv:1911.04019 [pdf, other]
Title: Non-Redundant OFDM Receiver Windowing for 5G Frames & Beyond
Comments: 10 pages, 5 figures comprising 3,2,1,2,2 subfigures and 4 author biographies. Final version accepted for publication in IEEE Transactions on Vehicular Technology
Subjects: Signal Processing (eess.SP)

Contemporary receiver windowed orthogonal frequency division multiplexing (RW-OFDM) algorithms have limited adjacent channel interference (ACI) rejection capability under high delay spread and small Fast Fourier Transform (FFT) sizes. Cyclic prefix (CP) is designed to be longer than the maximum excess delay (MED) of the channel to accommodate such algorithms in current standards. The robustness of these algorithms can only be improved against these conditions by adopting additional extensions in a new backward incompatible standard. Such extensions would deteriorate the performance of high mobility vehicular communication systems in particular. In this paper, we present a low-complexity Hann RW-OFDM scheme that provides resistance against ACI without requiring any intersymbol interference (ISI)-free redundancies. While this scheme is backward compatible with current and legacy standards and requires no changes to the conventionally transmitted signals, it also paves the way towards future spectrotemporally localized and efficient schemes suitable for higher mobility vehicular communications. A Hann window effectively rejects unstructured ACI at the expense of structured and limited intercarrier interference (ICI) across data carriers. A simple maximum ratio combining successive interference cancellation (MRC-SIC) receiver is therefore proposed to resolve this induced ICI and receive symbols transmitted by standard transmitters currently in use. The computational complexity of the proposed scheme is comparable to that of contemporary RW-OFDM algorithms, while ACI rejection and bit-error rate performance is superior in both long and short delay spreads. Channel estimation using Hann RW-OFDM symbols is also discussed.

[38]  arXiv:1911.04072 [pdf, other]
Title: Compressed Underwater Acoustic Communications for Dynamic Interaction with Underwater Vehicles
Subjects: Systems and Control (eess.SY)

Underwater vehicles are utilized in various applications including underwater data-collection missions. The tethered connection constrains the mission both in distance traveled and number of vehicles that can run in the same area, while the addition of acoustic communications onto the vehicles grants them several functionalities. However, due to the low bandwidth of the underwater acoustic channel-which leads to low data rates-and the time overhead imposed by both the channel propagation delay and the processing delay by the acoustic modems, efficient protocols are required. In this paper, an implicit data-compression and transmission protocol is proposed to carry out environmental monitoring missions such as adaptive sampling of physical and chemical parameters in the water. In a semi-autonomous manner between the vehicle and the control center, both sides keep silent in data transmission as long as they can estimate and predict the actions of the other side, unless environmental data and/or kinematic data are found to be unpredictable. Our design puts the human in the loop to send high-level control commands. Experiments were conducted using an autonomous vehicle with WHOI micro-modems in the Raritan River, Somerset, Carnegie Lake in Princeton, and in the Marine Park in Red Bank, all in New Jersey.

[39]  arXiv:1911.04080 [pdf, other]
Title: Real-time Image Enhancement for Vision-based Autonomous Underwater Vehicle Navigation in Murky Waters
Subjects: Image and Video Processing (eess.IV)

Classic vision-based navigation solutions, which are utilized in algorithms such as Simultaneous Localization and Mapping (SLAM), usually fail to work underwater when the water is murky and the quality of the recorded images is low. That is because most SLAM algorithms are feature-based techniques and often it is impossible to extract the matched features from blurry underwater images. To get more useful features, image processing techniques can be used to dehaze the images before they are used in a navigation/localization algorithm. There are many well-developed methods for image restoration, but the degree of enhancement and the resource cost of the methods are different. In this paper, we propose a new visual SLAM, specifically-designed for the underwater environment, using Generative Adversarial Networks (GANs) to enhance the quality of underwater images with underwater image quality evaluation metrics. This procedure increases the efficiency of SLAM and gets a better navigation and localization accuracy. We evaluate the proposed GANs-SLAM combination by using different images with various levels of turbidity in the water. Experiments were conducted and the data was extracted from the Carnegie Lake in Princeton, and the Raritan river both in New Jersey, USA.

[40]  arXiv:1911.04096 [pdf, other]
Title: UW-MARL: Multi-Agent Reinforcement Learning for Underwater Adaptive Sampling using Autonomous Vehicles
Subjects: Systems and Control (eess.SY)

Near-real-time water-quality monitoring in uncertain environments such as rivers, lakes, and water reservoirs of different variables is critical to protect the aquatic life and to prevent further propagation of the potential pollution in the water. In order to measure the physical values in a region of interest, adaptive sampling is helpful as an energy- and time-efficient technique since an exhaustive search of an area is not feasible with a single vehicle. We propose an adaptive sampling algorithm using multiple autonomous vehicles, which are well-trained, as agents, in a Multi-Agent Reinforcement Learning (MARL) framework to make efficient sequence of decisions on the adaptive sampling procedure. The proposed solution is evaluated using experimental data, which is fed into a simulation framework. Experiments were conducted in the Raritan River, Somerset and in Carnegie Lake, Princeton, NJ during July 2019.

[41]  arXiv:1911.04133 [pdf, other]
Title: IMNet: A Learning Based Detector for Index Modulation Aided MIMO-OFDM systems
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Index modulation (IM) brings the reduction of power consumption and complexity of the transmitter to classical multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. However, due to the introduction of IM, the complexity of the detector at receiver is greatly increased. Furthermore, the detector also requires the channel state information at receiver, which leads to high system overhead. To tackle these challenges, in this paper, we introduce deep learning (DL) in designing a non-iterative detector. Specifically, based on the structural sparsity of the transmitted signal in IM aided MIMO-OFDM systems, we first formulate the detection process as a sparse reconstruction problem. Then, a DL based detector called IMNet, which combines two subnets with the traditional least square method, is designed to recover the transmitted signal. To the best of our knowledge, this is the first attempt that designs the DL based detector for IM aided systems. Finally, to verify the adaptability and robustness of IMNet, simulations are carried out with consideration of correlated MIMO channels. The simulation results demonstrate that the proposed IMNet outperforms existing algorithms in terms of bit error rate and computational complexity under various scenarios.

[42]  arXiv:1911.04153 [pdf, ps, other]
Title: Optimal Tracking for Partially-Unknown Continuous Time Nonlinear Systems with Actuator Constraints using Critic-Only Integral Reinforcement Learning
Subjects: Systems and Control (eess.SY)

A novel IRL algorithm leveraging a variable gain gradient descent and incorporating the stabilizing term is presented in this paper. With these modifications, the IRL tracking controller can be implemented using just a single NN. The novel features of the update law includes 'variable learning rate' that is a function of instantaneous Hamilton-Jacobi-Bellman (HJB) error and rate of variation of Lyapunov along the system trajectories. The parameter update law does not require an initial stabilizing controller to initiate the policy iteration process. Additionally drift dynamics are also not required for either parameter update law or generation of control policy. The update law guarantees the convergence of augmented states and error in NN weights to a much tighter set around the origin and uniform ultimate boundedness (UUB) stability is proved. The control algorithm proposed in this paper is validated on a full 6-DoF UAV model for attitude control and simulation results presented at the end establish the efficacy of the proposed control scheme.

[43]  arXiv:1911.04157 [pdf, ps, other]
Title: Variable Gain Gradient Descent-based Robust Reinforcement Learning for Optimal Tracking Control of Unknown Nonlinear System with Input-Constraints
Subjects: Systems and Control (eess.SY)

In recent times, a variety of Reinforcement Learning (RL) algorithms have been proposed for optimal tracking problem of continuous time nonlinear systems with input constraints. Most of these algorithms are based on the notion of uniform ultimate boundedness (UUB) stability, in which normally higher learning rates are avoided in order to restrict oscillations in state error to smaller values. However, this comes at the cost of higher convergence time of critic neural network weights. This paper addresses that problem by proposing a novel tuning law containing a variable gain gradient descent for critic neural network that can adjust the learning rate based on Hamilton-Jacobi-Bellman (HJB) error and instantaneous rate of variation of Lyapunov function along augmented system trajectories. By allowing high learning rate the proposed variable gain gradient descent tuning law could improve the convergence time of critic neural network weights. Simultaneously, it also results in tighter residual set, on which trajectories of augmented system converge to, leading to smaller oscillations in state error. A tighter bound for UUB stability of the proposed update mechanism is proved. In order to obviate the requirement of nominal dynamics, a neural network based identifier is chosen from existing literature that precedes the RL controller. Numerical studies are then presented to validate the effectiveness of the combined identifier and robust Reinforcement Learning control scheme in controlling a continuous time nonlinear system.

[44]  arXiv:1911.04228 [pdf, ps, other]
Title: Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this paper, we propose a multi-channel speech source separation with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation with a statistical model. As a statistical model of microphone input signal, we adopts a time-varying spatial covariance matrix (SCM) model which includes reverberation and background noise submodels so as to achieve robustness against reverberation and background noise. The DNN infers intermediate variables which are needed for constructing the time-varying SCM. Speech source separation is performed in a probabilistic manner so as to avoid overfitting to separation error. Since there are multiple intermediate variables, a loss function which evaluates a single intermediate variable is not applicable. Instead, the proposed method adopts a loss function which evaluates the output probabilistic signal directly based on Kullback-Leibler Divergence (KLD). Gradient of the loss function can be back-propagated into the DNN through all the intermediate variables. Experimental results under reverberant conditions show that the proposed method can train the DNN efficiently even when the number of training utterances is small, i.e., 1K.

[45]  arXiv:1911.04239 [pdf, other]
Title: Hybrid Precoding for Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Comments: Accepted paper in IEEE Transactions on Vehicular Technology, Oct 2019
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

In multi-user millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems, hybrid precoding is a crucial task to lower the complexity and cost while achieving a sufficient sum-rate. Previous works on hybrid precoding were usually based on optimization or greedy approaches. These methods either provide higher complexity or have sub-optimum performance. Moreover, the performance of these methods mostly relies on the quality of the channel data. In this work, we propose a deep learning (DL) framework to improve the performance and provide less computation time as compared to conventional techniques. In fact, we design a convolutional neural network for MIMO (CNN-MIMO) that accepts as input an imperfect channel matrix and gives the analog precoder and combiners at the output. The procedure includes two main stages. First, we develop an exhaustive search algorithm to select the analog precoder and combiners from a predefined codebook maximizing the achievable sum-rate. Then, the selected precoder and combiners are used as output labels in the training stage of CNN-MIMO where the input-output pairs are obtained. We evaluate the performance of the proposed method through numerous and extensive simulations and show that the proposed DL framework outperforms conventional techniques. Overall, CNN-MIMO provides a robust hybrid precoding scheme in the presence of imperfections regarding the channel matrix. On top of this, the proposed approach exhibits less computation time with comparison to the optimization and codebook based approaches.

[46]  arXiv:1911.04244 [pdf, other]
Title: Boosting LSTM Performance Through Dynamic Precision Selection
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference.
In this work, we explore the use of dynamic precision selection during DNN inference. We focus on Long Short Term Memory (LSTM) networks, which represent the state-of-the-art networks for applications such as machine translation and speech recognition. Unlike conventional DNNs, LSTM networks remember information from previous evaluations by storing data in the LSTM cell state. Our key observation is that the cell state determines the amount of precision required: time steps where the cell state changes significantly require higher precision, whereas time steps where the cell state is stable can be computed with lower precision without any loss in accuracy.
Based on this observation, we implement a novel hardware scheme that tracks the evolution of the elements in the LSTM cell state and dynamically selects the appropriate precision in each time step. For a set of popular LSTM networks, our scheme selects the lowest precision for more than 66% of the time, outperforming systems that fix the precision statically. We evaluate our proposal on top of a modern accelerator highly optimized for LSTM computation, and show that it provides 1.56x speedup and 23% energy savings on average without any loss in accuracy. The extra hardware to determine the appropriate precision represents a small area overhead of 8.8%.

[47]  arXiv:1911.04255 [pdf, other]
Title: Decoding Imagined Speech and Computer Control using Brain Waves
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

In this work, we explore the possibility of decoding Imagined Speech brain waves using machine learning techniques. We propose a covariance matrix of Electroencephalogram channels as input features, projection to tangent space of covariance matrices for obtaining vectors from covariance matrices, principal component analysis for dimension reduction of vectors, an artificial feed-forward neural network as a classification model and bootstrap aggregation for creating an ensemble of neural network models. After the classification, two different Finite State Machines are designed that create an interface for controlling a computer system using an Imagined Speech-based BCI system. The proposed approach is able to decode the Imagined Speech signal with a maximum mean classification accuracy of 85% on binary classification task of one long word and a short word. We also show that our proposed approach is able to differentiate between imagined speech brain signals and rest state brain signals with maximum mean classification accuracy of 94%. We compared our proposed method with other approaches for decoding imagined speech and show that our approach performs equivalent to the state of the art approach on decoding long vs. short words and outperforms it significantly on the other two tasks of decoding three short words and three vowels with an average margin of 11% and 9%, respectively. We also obtain an information transfer rate of 21-bits-per-minute when using an IS based system to operate a computer. These results show that the proposed approach is able to decode a wide variety of imagined speech signals without any human-designed features.

[48]  arXiv:1911.04263 [pdf]
Title: AI-Based Autonomous Line Flow Control via Topology Adjustment for Maximizing Time-Series ATCs
Comments: The paper has been submitted to IEEE PES GM 2020
Subjects: Signal Processing (eess.SP)

This paper presents a novel AI-based approach for maximizing time-series available transfer capabilities (ATCs) via autonomous topology control considering various practical constraints and uncertainties. Several AI techniques including supervised learning and deep reinforcement learning (DRL) are adopted and improved to train effective AI agents for achieving the desired performance. First, imitation learning (IL) is used to provide a good initial policy for the AI agent. Then, the agent is trained by DRL algorithms with a novel guided exploration technique, which significantly improves the training efficiency. Finally, an Early Warning (EW) mechanism is designed to help the agent find good topology control strategies for long testing periods, which helps the agent to determine action timing using power system domain knowledge; thus, effectively increases the system error-tolerance and robustness. Effectiveness of the proposed approach is demonstrated in the "2019 Learn to Run a Power Network (L2RPN)" global competition, where the developed AI agents can continuously and safely control a power grid to maximize ATCs without operator's intervention for up to 1-month's operation data and eventually won the first place in both development and final phases of the competition. The winning agent has been open-sourced on GitHub.

[49]  arXiv:1911.04289 [pdf, other]
Title: Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors
Comments: 9 pages, 4 figures. Presented at the International Workshop on Machine Learning in Clinical Neuroimaging (MLCN) 2019
Journal-ref: OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging. OR 2.0 2019, MLCN 2019. Lecture Notes in Computer Science, vol 11796. Springer, Cham
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)

With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: i) a subset of unseen healthy controls; and ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies.

[50]  arXiv:1911.04291 [pdf, other]
Title: Machine Learning-Based Adaptive Receive Filtering: Proof-of-Concept on an SDR Platform
Comments: submitted to ICC 2020
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional multiuser detection techniques either require a large number of antennas at the receiver for a desired performance, or they are too complex for practical implementation. Moreover, many of these techniques, such as successive interference cancellation (SIC), suffer from errors in parameter estimation (user channels, covariance matrix, noise variance, etc.) that is performed before detection of user data symbols. As an alternative to conventional methods, this paper proposes and demonstrates a low-complexity practical Machine Learning (ML) based receiver that achieves similar (and at times better) performance to the SIC receiver. The proposed receiver does not require parameter estimation; instead it uses supervised learning to detect the user modulation symbols directly. We perform comparisons with minimum mean square error (MMSE) and SIC receivers in terms of symbol error rate (SER) and complexity.

[51]  arXiv:1911.04357 [pdf]
Title: Limited View and Sparse Photoacoustic Tomography for Neuroimaging with Deep Learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)

Photoacoustic tomography (PAT) is a nonionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their view of the imaging target, which result in significant image reconstruction artifacts degrading image quality. Iterative reconstruction methods can be used to reduce artifacts but are computationally expensive. In this work, we propose a novel deep learning approach termed pixelwise deep learning (PixelDL) that first employs pixelwise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to directly reconstruct an image. Simulated photoacoustic data from synthetic vasculature phantom and mouse-brain vasculature were used for training and testing, respectively. Results demonstrated that PixelDL achieved comparable performance to iterative methods and outperformed other CNN-based approaches for correcting artifacts. PixelDL is a computationally efficient approach that enables for realtime PAT rendering and for improved image quality, quantification, and interpretation.

[52]  arXiv:1911.04379 [pdf]
Title: Modeling EEG data distribution with a Wasserstein Generative Adversarial Network to predict RSVP Events
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Electroencephalography (EEG) data are difficult to obtain due to complex experimental setups and reduced comfort with prolonged wearing. This poses challenges to train powerful deep learning model with the limited EEG data. Being able to generate EEG data computationally could address this limitation. We propose a novel Wasserstein Generative Adversarial Network with gradient penalty (WGAN-GP) to synthesize EEG data. This network addresses several modeling challenges of simulating time-series EEG data including frequency artifacts and training instability. We further extended this network to a class-conditioned variant that also includes a classification branch to perform event-related classification. We trained the proposed networks to generate one and 64-channel data resembling EEG signals routinely seen in a rapid serial visual presentation (RSVP) experiment and demonstrated the validity of the generated samples. We also tested intra-subject cross-session classification performance for classifying the RSVP target events and showed that class-conditioned WGAN-GP can achieve improved event-classification performance over EEGNet.

[53]  arXiv:1911.04410 [pdf]
Title: A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging
Comments: 14 pages, 7 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Infrared (IR) microscopes measure spectral information that quantifies molecular content to assign the identity of biomedical cells but the spatial quality of optical microscopy to appreciate morphologic features. Here, we propose a method to utilize the semantic information of cellular identity from IR imaging with the morphologic detail of pathology images in a deep learning-based approach to image super-resolution. Using Generative Adversarial Networks (GANs), we enhance the spatial detail in IR imaging beyond the diffraction limit while retaining their spectral contrast. This technique can be rapidly integrated with modern IR microscopes to provide a framework useful for routine pathology.

[54]  arXiv:1911.04426 [pdf, other]
Title: A Routing and Link Scheduling Strategy for Smart Grid NAN Communications
Subjects: Systems and Control (eess.SY)

As large scale deployment of smart devices in the power grid continues, research efforts need to increasingly focus on efficient communication of generated information. This paper describes a strategy for static routing and scheduling of messages in a multi-hop wireless Smart Grid Neighborhood Area Network (NAN) with multiple source nodes and a common set of destinations or gateways. The problem is formulated as a Mixed Integer Linear Program (MILP) and solved using commercial optimization solver CPLEX. Feasibility of the scheme is demonstrated using different network models, constraints, message injection rates, and initial conditions. It is shown that the proposed approach can be used to generate an optimal link schedule for collecting user-generated bids in a transactive energy market in the least possible time. It is also shown that the methodology is applicable to multiple destination nodes and that their location affects message delivery time.

[55]  arXiv:1911.04440 [pdf, other]
Title: Proactive Islanding of the Power Grid to Mitigate High-Impact Low-Frequency Events
Subjects: Systems and Control (eess.SY)

This paper proposes a methodology for enhancing power systems resiliency by proactively splitting an interconnected grid into small self-sustaining islands in preparation for extreme events. The idea is to posture the system so that cascading outages can be bound within affected areas, preventing the propagation of disturbances to the rest of the system. This mitigation strategy will prove especially useful when advance notification of a threat is available but its nature not well understood. In our method, islands are determined using a constrained hierarchical spectral clustering technique. We further check the viability of the resultant islands using steady-state AC power flow. The performance of the approach is illustrated using a detailed PSS/E model of the heavily meshed transmission network operated by PJM Interconnection in the eastern USA. Representative cases from different seasons show that variations in power flow patterns influence island configuration.

Cross-lists for Tue, 12 Nov 19

[56]  arXiv:1602.01969 (cross-list from math.OC) [pdf, other]
Title: Voltage stress minimization by optimal reactive power control
Comments: 10 pages, 9 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

A standard operational requirement in power systems is that the voltage magnitudes lie within prespecified bounds. Conventional engineering wisdom suggests that such a tightly-regulated profile, imposed for system design purposes and good operation of the network, should also guarantee a secure system, operating far from static bifurcation instabilities such as voltage collapse. In general however, these two objectives are distinct and must be separately enforced. We formulate an optimization problem which maximizes the distance to voltage collapse through injections of reactive power, subject to power flow and operational voltage constraints. By exploiting a linear approximation of the power flow equations we arrive at a convex reformulation which can be efficiently solved for the optimal injections. We also address the planning problem of allocating the resources by recasting our problem in a sparsity-promoting framework that allows us to choose a desired trade-off between optimality of injections and the number of required actuators. Finally, we present a distributed algorithm to solve the optimization problem, showing that it can be implemented on-line as a feedback controller. We illustrate the performance of our results with the IEEE30 bus network.

[57]  arXiv:1707.08243 (cross-list from cs.SY) [pdf, ps, other]
Title: A Graphical Characterization of Structurally Controllable Linear Systems with Dependent Parameters
Journal-ref: IEEE Transactions on Automatic Control, 2019
Subjects: Systems and Control (eess.SY)

One version of the concept of structural controllability defined for single-input systems by Lin and subsequently generalized to multi-input systems by others, states that a parameterized matrix pair $(A, B)$ whose nonzero entries are distinct parameters, is structurally controllable if values can be assigned to the parameters which cause the resulting matrix pair to be controllable. In this paper the concept of structural controllability is broadened to allow for the possibility that a parameter may appear in more than one location in the pair $(A, B)$. Subject to a certain condition on the parameterization called the "binary assumption", an explicit graph-theoretic characterization of such matrix pairs is derived.

[58]  arXiv:1710.10829 (cross-list from math.OC) [pdf, ps, other]
Title: Generalized gradient optimization over lossy networks for partition-based estimation
Comments: 20 pages, 5 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We address the problem of distributed convex unconstrained optimization over networks characterized by asynchronous and possibly lossy communications. We analyze the case where the global cost function is the sum of locally coupled local strictly convex cost functions. As discussed in detail in a motivating example, this class of optimization objectives is, for example, typical in localization problems and in partition-based state estimation. Inspired by a generalized gradient descent strategy, namely the block Jacobi iteration, we propose a novel solution which is amenable for a distributed implementation and which, under a suitable condition on the step size, is provably locally resilient to communication failures. The theoretical analysis relies on the separation of time scales and Lyapunov theory. In addition, to show the flexibility of the proposed algorithm, we derive a resilient gradient descent iteration and a resilient generalized gradient for quadratic programming as two natural particularizations of our strategy. In this second case, global robustness is provided. Finally, the proposed algorithm is numerically tested on the IEEE 123 nodes distribution feeder in the context of partition-based smart grid robust state estimation in the presence of measurements outliers.

[59]  arXiv:1904.01499 (cross-list from cs.SY) [pdf, ps, other]
Title: On the Existence of a Fixed Spectrum for a Multi-channel Linear System: A Matroid Theory Approach
Subjects: Systems and Control (eess.SY)

Conditions for the existence of a fixed spectrum \{i.e., the set of fixed modes\} for a multi-channel linear system have been known for a long time. The aim of this paper is to reestablish one of these conditions using a new and transparent approach based on matroid theory.

[60]  arXiv:1911.03462 (cross-list from cs.CV) [pdf, other]
Title: Knowledge Distillation for Incremental Learning in Semantic Segmentation
Comments: 13 pages, 6 figures, 14 tables. arXiv admin note: substantial text overlap with arXiv:1907.13372
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Although deep learning architectures have shown remarkable results in scene understanding problems, they exhibit a critical drop of overall performance due to catastrophic forgetting when they are required to incrementally learn to recognize new classes without forgetting the old ones. This phenomenon impacts on the deployment of artificial intelligence in real world scenarios where systems need to learn new and different representations over time. Current approaches for incremental learning deal only with the image classification and object detection tasks. In this work we formally introduce the incremental learning problem for semantic segmentation. To avoid catastrophic forgetting we propose to distill the knowledge of the previous model to retain the information about previously learned classes, whilst updating the current model to learn the new ones. We developed three main methodologies of knowledge distillation working on both the output layers and the internal feature representations. Furthermore, differently from other recent frameworks, we do not store any image belonging to the previous training stages while only the last model is used to preserve high accuracy on previously learned classes. Extensive results were conducted on the Pascal VOC2012 dataset and show the effectiveness of the proposed approaches in different incremental learning scenarios.

[61]  arXiv:1911.03540 (cross-list from cs.NE) [pdf, ps, other]
Title: Cross-subject Decoding of Eye Movement Goals from Local Field Potentials
Subjects: Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)

Objective. We consider the cross-subject decoding problem from local field potential (LFP) activity, where training data collected from the pre-frontal cortex of a subject (source) is used to decode intended motor actions in another subject (destination). Approach. We propose a novel pre-processing technique, referred to as data centering, which is used to adapt the feature space of the source to the feature space of the destination. The key ingredients of data centering are the transfer functions used to model the deterministic component of the relationship between the source and destination feature spaces. We also develop an efficient data-driven estimation approach for linear transfer functions that uses the first and second order moments of the class-conditional distributions. Main result. We apply our techniques for cross-subject decoding of eye movement directions in an experiment where two macaque monkeys perform memory-guided visual saccades to one of eight target locations. The results show peak cross-subject decoding performance of $80\%$, which marks a substantial improvement over random choice decoder. Significance. The analyses presented herein demonstrate that the data centering is a viable novel technique for reliable cross-subject brain-computer interfacing.

[62]  arXiv:1911.03552 (cross-list from physics.optics) [pdf]
Title: Electromagnetically induced transparency at a chiral exceptional point
Comments: 22 pages, 4 figures, 44 references
Subjects: Optics (physics.optics); Systems and Control (eess.SY); Classical Physics (physics.class-ph); Quantum Physics (quant-ph)

Electromagnetically induced transparency, as a quantum interference effect to eliminate optical absorption in an opaque medium, has found extensive applications in slow light generation, optical storage, frequency conversion, optical quantum memory as well as enhanced nonlinear interactions at the few-photon level in all kinds of systems. Recently, there have been great interests in exceptional points, a spectral singularity that could be reached by tuning various parameters in open systems, to render unusual features to the physical systems, such as optical states with chirality. Here we theoretically and experimentally study transparency and absorption modulated by chiral optical states at exceptional points in an indirectly-coupled resonator system. By tuning one resonator to an exceptional point, transparency or absorption occurs depending on the chirality of the eigenstate. Our results demonstrate a new strategy to manipulate the light flow and the spectra of a photonic resonator system by exploiting a discrete optical state associated with specific chirality at an exceptional point as a unique control bit, which opens up a new horizon of controlling slow light using optical states. Compatible with the idea of state control in quantum gate operation, this strategy hence bridges optical computing and storage.

[63]  arXiv:1911.03565 (cross-list from cs.CV) [pdf]
Title: Vision-Based Lane-Changing Behavior Detection Using Deep Residual Neural Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Accurate lane localization and lane change detection are crucial in advanced driver assistance systems and autonomous driving systems for safer and more efficient trajectory planning. Conventional localization devices such as Global Positioning System only provide road-level resolution for car navigation, which is incompetent to assist in lane-level decision making. The state of art technique for lane localization is to use Light Detection and Ranging sensors to correct the global localization error and achieve centimeter-level accuracy, but the real-time implementation and popularization for LiDAR is still limited by its computational burden and current cost. As a cost-effective alternative, vision-based lane change detection has been highly regarded for affordable autonomous vehicles to support lane-level localization. A deep learning-based computer vision system is developed to detect the lane change behavior using the images captured by a front-view camera mounted on the vehicle and data from the inertial measurement unit for highway driving. Testing results on real-world driving data have shown that the proposed method is robust with real-time working ability and could achieve around 87% lane change detection accuracy. Compared to the average human reaction to visual stimuli, the proposed computer vision system works 9 times faster, which makes it capable of helping make life-saving decisions in time.

[64]  arXiv:1911.03573 (cross-list from physics.med-ph) [pdf]
Title: T2-weighted Spine Imaging using a Single-Shot Turbo Spin Echo Pulse Sequence
Comments: 13 pages, 7 figures, 4 tables, Previously submitted to "Investigative Radiology"
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)

T2 weighted imaging of the spine is commonly performed using fast spin echo (FSE/TSE) based sequences, resulting in long scan times and vulnerability to motion artifacts. While single shot fast spin echo sequences have been attempted, their adoption has been limited by poor spatial resolution and specific absorption rate (SAR) limitations. We investigate the use of a half-Fourier acquisition single-shot turbo spin-echo variable flip angle (HASTE-VFA) sequence for T2 weighted spine imaging. A variable refocusing flip angle echo train was first optimized for the spine to improve the point spread function (PSF) and minimize SAR, yielding images with improved spatial resolution and signal-to-noise ratio compared to the constant flip angle sequence. Data was acquired from 29 patients (20 lumbar and thoracolumbar, 9 whole-spine) using conventional fast spin echo and the proposed variable flip angle single-shot sequences. All images were graded by two experienced neuroradiologists in a blinded fashion and scores were assigned based on blurring, motion, artifacts, and noise as well as appearance of disks, facet joints, end plates, nerve roots and the spinal cord.

[65]  arXiv:1911.03583 (cross-list from cs.LG) [pdf, other]
Title: Community-preserving Graph Convolutions for Structural and Functional Joint Embedding of Brain Networks
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Brain networks have received considerable attention given the critical significance for understanding human brain organization, for investigating neurological disorders and for clinical diagnostic applications. Structural brain network (e.g. DTI) and functional brain network (e.g. fMRI) are the primary networks of interest. Most existing works in brain network analysis focus on either structural or functional connectivity, which cannot leverage the complementary information from each other. Although multi-view learning methods have been proposed to learn from both networks (or views), these methods aim to reach a consensus among multiple views, and thus distinct intrinsic properties of each view may be ignored. How to jointly learn representations from structural and functional brain networks while preserving their inherent properties is a critical problem. In this paper, we propose a framework of Siamese community-preserving graph convolutional network (SCP-GCN) to learn the structural and functional joint embedding of brain networks. Specifically, we use graph convolutions to learn the structural and functional joint embedding, where the graph structure is defined with structural connectivity and node features are from the functional connectivity. Moreover, we propose to preserve the community structure of brain networks in the graph convolutions by considering the intra-community and inter-community properties in the learning process. Furthermore, we use Siamese architecture which models the pair-wise similarity learning to guide the learning process. To evaluate the proposed approach, we conduct extensive experiments on two real brain network datasets. The experimental results demonstrate the superior performance of the proposed approach in structural and functional joint embedding for neurological disorder analysis, indicating its promising value for clinical applications.

[66]  arXiv:1911.03607 (cross-list from cs.CV) [pdf, other]
Title: DeepMask: an algorithm for cloud and cloud shadow detection in optical satellite remote sensing images using deep residual network
Comments: 17 pages, 4 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Detecting and masking cloud and cloud shadow from satellite remote sensing images is a pervasive problem in the remote sensing community. Accurate and efficient detection of cloud and cloud shadow is an essential step to harness the value of remotely sensed data for almost all downstream analysis. DeepMask, a new algorithm for cloud and cloud shadow detection in optical satellite remote sensing imagery, is proposed in this study. DeepMask utilizes ResNet, a deep convolutional neural network, for pixel-level cloud mask generation. The algorithm is trained and evaluated on the Landsat 8 Cloud Cover Assessment Validation Dataset distributed across 8 different land types. Compared with CFMask, the most widely used cloud detection algorithm, land-type-specific DeepMask models achieve higher accuracy across all land types. The average accuracy is 93.56%, compared with 85.36% from CFMask. DeepMask also achieves 91.02% accuracy on all-land-type dataset. Compared with other CNN-based cloud mask algorithms, DeepMask benefits from the parsimonious architecture and the residual connection of ResNet. It is compatible with input of any size and shape. DeepMask still maintains high performance when using only red, green, blue, and NIR bands, indicating its potential to be applied to other satellite platforms that only have limited optical bands.

[67]  arXiv:1911.03667 (cross-list from cs.LG) [pdf, other]
Title: Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling
Comments: To be submitted to Journal of Machine Learning Research (JMLR)
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.

[68]  arXiv:1911.03725 (cross-list from cs.LG) [pdf, other]
Title: Tensor Regression Using Low-rank and Sparse Tucker Decompositions
Comments: 26 pages, 3 figures, 1 table; preprint of a journal article
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST); Machine Learning (stat.ML)

This paper studies a tensor-structured linear regression model with a scalar response variable and tensor-structured predictors, such that the regression parameters form a tensor of order $d$ (i.e., a $d$-fold multiway array) in $\mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$. This work focuses on the task of estimating the regression tensor from $m$ realizations of the response variable and the predictors where $m\ll n = \prod \nolimits_{i} n_i$. Despite the ill-posedness of this estimation problem, it can still be solved if the parameter tensor belongs to the space of sparse, low Tucker-rank tensors. Accordingly, the estimation procedure is posed as a non-convex optimization program over the space of sparse, low Tucker-rank tensors, and a tensor variant of projected gradient descent is proposed to solve the resulting non-convex problem. In addition, mathematical guarantees are provided that establish the proposed method converges to the correct solution under the right set of conditions. Further, an upper bound on sample complexity of tensor parameter estimation for the model under consideration is characterized for the special case when the individual (scalar) predictors independently draw values from a sub-Gaussian distribution. The sample complexity bound is shown to have a polylogarithmic dependence on $\bar{n} = \max \big\{n_i: i\in \{1,2,\ldots,d \} \big\}$ and, orderwise, it matches the bound one can obtain from a heuristic parameter counting argument. Finally, numerical experiments demonstrate the efficacy of the proposed tensor model and estimation method on a synthetic dataset and a neuroimaging dataset pertaining to attention deficit hyperactivity disorder. Specifically, the proposed method exhibits better sample complexities on both synthetic and real datasets, demonstrating the usefulness of the model and the method in settings where $n \gg m$.

[69]  arXiv:1911.03730 (cross-list from cs.CY) [pdf]
Title: Forecasting the effect of heat stress index and climate change on cloud data center energy consumption
Subjects: Computers and Society (cs.CY); Signal Processing (eess.SP); Applications (stat.AP)

In this paper, we estimate the effect of heat stress index (a measure which takes into account rising temperatures as well as humidity) on data center energy consumption. We use forecasting models to predict future energy use by data centers, taking into account rising temperature scenarios. We compare those estimates with baseline forecasted energy consumption (without heat stress index or rising temperature correction) and present the result that there is a sizeable and significant difference in the two forecasts. We show that rising temperatures will cause a negative impact on data center energy consumption, increasing it by about 8 percent, and conclude that data center energy consumption analyses and forecasts must include the effects of heat stress index and rising temperatures and other climate change related effects.

[70]  arXiv:1911.03744 (cross-list from cs.IT) [pdf, ps, other]
Title: Estimation in Poisson Noise: Properties of the Conditional Mean Estimator
Comments: Short version was presented at ITW 2019 in Visby
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Statistics Theory (math.ST)

This paper considers estimation of a random variable in Poisson noise with signal scaling coefficient and dark current as explicit parameters of the noise model. Specifically, the paper focuses on properties of the conditional mean estimator as a function of the scaling coefficient, the dark current parameter, the distribution of the input random variable and channel realizations.
With respect to the scaling coefficient and the dark current several identities in terms of derivatives are established. For example, it is shown that the derivative of the conditional mean estimator with respect to the dark current parameter is proportional to the conditional variance. Moreover, a version of score function is proposed and a Tweedie-like formula for the conditional expectation is recovered.
With respect to the distribution several regularity conditions are shown. For instance, it is shown that the conditional mean estimator uniquely determines the input distribution. Moreover, it is shown that if the conditional expectation is close to a linear function in the mean squared error, then the input distribution is approximately gamma in the L\'evy distance.

[71]  arXiv:1911.03754 (cross-list from cs.RO) [pdf, other]
Title: Hybrid Localization: A Low Cost, Low Complexity Approach Based on Wi-Fi and Odometry
Journal-ref: 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), Honolulu, HI, USA, 2019, pp. 1-7
Subjects: Robotics (cs.RO); Signal Processing (eess.SP)

Localization in indoor environments is essential to further support automation in a wide array of scenarios. Moreover, direction-of-arrival knowledge is essential to supporting high speed millimeter-wave (mmWave) links in indoor environments, since most mmWave links are of a line-of-sight nature to combat the high pathloss in this band. Accurate wireless localization in indoor environments, however, has proved a challenging task due to multi-path fading. Additionally, due to the effects of multi-path fading, methods such as trilateration alone do not result in accurate localization. As such, in this paper we propose to combine the knowledge of wireless localization methods with that of odometry sensors to track the location of a mobile robot. This paper presents significant real-world localization measurement results for both Wi-Fi and odometry in diverse environments at the Boise State University campus. Using these results, we devise an algorithm to combine data from both odometry and wireless localization. This algorithm is shown in hardware testing to reduce the localization error for a mobile robot

[72]  arXiv:1911.03762 (cross-list from cs.CL) [pdf, other]
Title: Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Comments: 5 pages, 3 figures, Interspeech 2019
Journal-ref: Interspeech 2019, Graz, Austria
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition. The first method is Kullback-Leibler divergence (KLD) regularization, in which the output distribution of a speaker-dependent (SD) AED is forced to be close to that of the speaker-independent (SI) model by adding a KLD regularization to the adaptation criterion. To compensate for the asymmetric deficiency in KLD regularization, an adversarial speaker adaptation (ASA) method is proposed to regularize the deep-feature distribution of the SD AED through the adversarial learning of an auxiliary discriminator and the SD AED. The third approach is the multi-task learning, in which an SD AED is trained to jointly perform the primary task of predicting a large number of output units and an auxiliary task of predicting a small number of output units to alleviate the target sparsity issue. Evaluated on a Microsoft short message dictation task, all three methods are highly effective in adapting the AED model, achieving up to 12.2% and 3.0% word error rate improvement over an SI AED trained from 3400 hours data for supervised and unsupervised adaptation, respectively.

[73]  arXiv:1911.03774 (cross-list from math.DS) [pdf, other]
Title: A notion of equivalence for linear complementarity problems with application to the design of non-smooth bifurcations
Comments: Submitted to the IFAC 2020 World Congress
Subjects: Dynamical Systems (math.DS); Systems and Control (eess.SY)

Many systems of interest to control engineering can be modeled by linear complementarity problems. We introduce a new notion of equivalence between linear complementarity problems that sets the basis to translate the powerful tools of smooth bifurcation theory to this class of models. Leveraging this notion of equivalence, we introduce new tools to analyze, classify, and design non-smooth bifurcations in linear complementarity problems and their interconnection.

[74]  arXiv:1911.03803 (cross-list from cs.LG) [pdf, other]
Title: XceptionTime: A Novel Deep Architecture based on Depthwise Separable Convolutions for Hand Gesture Classification
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Capitalizing on the need for addressing the existing challenges associated with gesture recognition via sparse multichannel surface Electromyography (sEMG) signals, the paper proposes a novel deep learning model, referred to as the XceptionTime architecture. The proposed innovative XceptionTime is designed by integration of depthwise separable convolutions, adaptive average pooling, and a novel non-linear normalization technique. At the heart of the proposed architecture is several XceptionTime modules concatenated in series fashion designed to capture both temporal and spatial information-bearing contents of the sparse multichannel sEMG signals without the need for data augmentation and/or manual design of feature extraction. In addition, through integration of adaptive average pooling, Conv1D, and the non-linear normalization approach, XceptionTime is less prone to overfitting, more robust to temporal translation of the input, and more importantly is independent from the input window size. Finally, by utilizing the depthwise separable convolutions, the XceptionTime network has far fewer parameters resulting in a less complex network. The performance of XceptionTime is tested on a sub Ninapro dataset, DB1, and the results showed a superior performance in comparison to any existing counterparts. In this regard, 5:71% accuracy improvement, on a window size 200ms, is reported in this paper, for the first time.

[75]  arXiv:1911.03810 (cross-list from math.OC) [pdf, other]
Title: Parameter Estimation in Adaptive Control of Time-Varying Systems Under a Range of Excitation Conditions
Comments: 8 Pages, preliminary draft
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

This paper presents a new parameter estimation algorithm for the adaptive control of a class of time-varying plants. The main feature of this algorithm is a matrix of time-varying learning rates, which enables parameter estimation error trajectories to tend exponentially fast towards a compact set whenever excitation conditions are satisfied. This algorithm is employed in a large class of problems where unknown parameters are present and are time-varying. It is shown that this algorithm guarantees global boundedness of the state and parameter errors of the system, and avoids an often used filtering approach for constructing key regressor signals. In addition, intervals of time over which these errors tend exponentially fast toward a compact set are provided, both in the presence of finite and persistent excitation. A projection operator is used to ensure the boundedness of the learning rate matrix, as compared to a time-varying forgetting factor. Numerical simulations are provided to complement the theoretical analysis.

[76]  arXiv:1911.03952 (cross-list from cs.SD) [pdf, other]
Title: Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model
Comments: This study was conducted during an internship of the first author at NII, Japan in 2017
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones. The objective of this research was to study the automatic generation of high-quality speech from such low-quality device-recorded speech, which could then be applied to many speech-generation tasks. In this paper, we first introduce our new device-recorded speech dataset then propose an improved end-to-end method for automatically transforming the low-quality device-recorded speech into professional high-quality speech. Our method is an extension of a generative adversarial network (GAN)-based speech enhancement model called speech enhancement GAN (SEGAN), and we present two modifications to make model training more robust and stable. Finally, from a large-scale listening test, we show that our method can significantly enhance the quality of device-recorded speech signals.

[77]  arXiv:1911.04017 (cross-list from physics.med-ph) [pdf]
Title: Quantitative T2 Estimation Using Radial Turbo Spin Echo Imaging
Comments: 16 pages, 8 figures, Technical Note on Methodology
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)

There has been increased interest in the quantitative characterization of tissues based on T2 in abdominal imaging. Techniques based on spin-echo or turbo spin-echo sequences are time consuming because they require multiple acquisitions for obtaining an adequate number of TE images for accurate T2 mapping. Radial turbo spin echo (RADTSE) based methods have been shown to generate accurate T2 maps from a single acquisition using highly undersampled data. In this work, we present details of the RADTSE technique, summarizing developments related to the design of pulse sequence and reconstruction algorithms. Results from phantom and in vivo imaging experiments are also presented.

[78]  arXiv:1911.04018 (cross-list from cs.LG) [pdf, other]
Title: Feedback Recurrent AutoEncoder
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.

[79]  arXiv:1911.04031 (cross-list from cs.IT) [pdf, other]
Title: Intermittent Information-Driven Search for Underwater Targets
Comments: 6 pages
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The problem is area-restricted search for targets using an autonomous mobile sensing platform. Detection is imperfect: the probability of detection depends on the range to the target, while the probability of false detections is non-zero. The paper develops an intermittent information-driven search strategy, which combines fast and non-receptive displacement phase (ballistic phase) with a slow displacement sensing phase. Decisions where to move next, both in the ballistic phase and the slow displacement phase, are information-driven: they maximise the expected information gain. The paper demonstrates the efficiency of the proposed strategy in the context of a search for underwater targets: the searcher is an autonomous amphibious drone which can both fly and land or takeoff from the sea surface.

[80]  arXiv:1911.04048 (cross-list from stat.ML) [pdf, other]
Title: Multidataset Independent Subspace Analysis with Application to Multimodal Fusion
Authors: Rogers F. Silva (1 and 2), Sergey M. Plis (1 and 2), Tulay Adali (3), Marios S. Pattichis (4), Vince D. Calhoun (1 and 2) ((1) Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, GA, USA, (2) The Mind Research Network, Albuquerque, NM, USA, (3) Dept. of CSEE, University of Maryland Baltimore County, Baltimore, Maryland, USA, (4) Dept. of ECE at The University of New Mexico, Albuquerque, NM, USA)
Comments: For associated code, see this https URL For associated data, see this https URL Submitted to IEEE Transactions on Image Processing on Nov/7/2019: 13 pages, 8 figures Supplement: 16 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Applications (stat.AP)

In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems.
To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion.
Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model.
We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data.

[81]  arXiv:1911.04092 (cross-list from physics.geo-ph) [pdf]
Title: Seismic data interpolation based on U-net with texture loss
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Missing traces in acquired seismic data is a common occurrence during the collection of seismic data. Deep neural network (DNN) has shown considerable promise in restoring incomplete seismic data. However, several DNN-based approaches ignore the specific characteristics of seismic data itself, and only focus on reducing the difference between the recovered and the original signals. In this study, a novel Seismic U-net InterpolaTor (SUIT) is proposed to preserve the seismic texture information while reconstructing the missing traces. Aside from minimizing the reconstruction error, SUIT enhances the texture consistency between the recovery and the original completely seismic data, by designing a pre-trained U-Net to extract the texture information. The experiments show that our method outperforms the classic state-of-art methods in terms of robustness.

[82]  arXiv:1911.04220 (cross-list from cs.GT) [pdf, other]
Title: Non-Cooperative Inverse Reinforcement Learning
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. Formally, we model the N-CIRL formalism as a zero-sum Markov game with one-sided incomplete information. Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single-stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player's equilibrium strategy. Another recursive formula, constructed by forming an auxiliary game, termed the dual game, yields the less informed player's strategy. Building upon these two recursive formulas, we develop a computationally tractable algorithm to approximately solve for the equilibrium strategies. Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting.

[83]  arXiv:1911.04261 (cross-list from cs.SD) [pdf, other]
Title: Voice Activity Detection in presence of background noise using EEG
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

In this paper we demonstrate that performance of voice activity detection (VAD) system operating in presence of background noise can be improved by concatenating acoustic input features with electroencephalography (EEG) features. We also demonstrate that VAD using only EEG features shows better performance than VAD using only acoustic features in presence of background noise. We implemented a recurrent neural network (RNN) based VAD system and we demonstrate our results for two different data sets recorded in presence of different noise conditions in this paper.

[84]  arXiv:1911.04279 (cross-list from physics.soc-ph) [pdf, other]
Title: Community Detection for Power Systems Network Aggregation Considering Renewable Variability
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Systems and Control (eess.SY)

The increasing penetration of variable renewable energy (VRE) has brought significant challenges for power systems planning and operation. These highly variable sources are typically distributed in the grid; therefore, a detailed representation of transmission bottlenecks is fundamental to approximate the impact of the transmission network on the dispatch with VRE resources. The fine grain temporal scale of short term and day-ahead dispatch, taking into account the network constraints, also mandatory for mid-term planning studies, combined with the high variability of the VRE has brought the need to represent these uncertainties in stochastic optimization models while taking into account the transmission system. These requirements impose a computational burden to solve the planning and operation models. We propose a methodology based on community detection to aggregate the network representation, capable of preserving the locational marginal price (LMP) differences in multiple VRE scenarios, and describe a real-world operational planning study. The optimal expected cost solution considering aggregated networks is compared with the full network representation. Both representations were embedded in an operation model relying on Stochastic Dual Dynamic Programming (SDDP) to deal with the random variables in a multi-stage problem.

[85]  arXiv:1911.04283 (cross-list from cs.CL) [pdf, other]
Title: Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where ST task severely lacks data. In the meta-learning phase, the parameters of the model are exposed to vast amounts of speech transcripts (e.g., English ASR) and text translations (e.g., English-German MT). During this phase, parameters are updated in such a way to understand speech, text representations, the relation between them, as well as act as a good initialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.

[86]  arXiv:1911.04292 (cross-list from cs.CL) [pdf, other]
Title: Diversity by Phonetics and its Application in Neural Machine Translation
Comments: In openreview.net (28 May 2019)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

We introduce a powerful approach for Neural Machine Translation (NMT), whereby, during training and testing, together with the input we provide its phonetic encoding and the variants of such an encoding. This way we obtain very significant improvements up to 4 BLEU points over the state-of-the-art large-scale system. The phonetic encoding is the first part of our contribution, with a second being a theory that aims to understand the reason for this improvement. Our hypothesis states that the phonetic encoding helps NMT because it encodes a procedure to emphasize the difference between semantically diverse sentences. We conduct an empirical geometric validation of our hypothesis in support of which we obtain overwhelming evidence. Subsequently, as our third contribution and based on our theory, we develop artificial mechanisms that leverage during learning the hypothesized (and verified) effect phonetics. We achieve significant and consistent improvements overall language pairs and datasets: French-English, German-English, and Chinese-English in medium task IWSLT'17 and French-English in large task WMT'18 Bio, with up to 4 BLEU points over the state-of-the-art. Moreover, our approaches are more robust than baselines when evaluated on unknown out-of-domain test sets with up to a 5 BLEU point increase.

[87]  arXiv:1911.04317 (cross-list from stat.OT) [pdf]
Title: Machine Learning for high speed channel optimization
Comments: 3 Pages
Subjects: Other Statistics (stat.OT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Design of printed circuit board (PCB) stack-up requires the consideration of characteristic impedance, insertion loss and crosstalk. As there are many parameters in a PCB stack-up design, the optimization of these parameters needs to be efficient and accurate. A less optimal stack-up would lead to expensive PCB material choices in high speed designs. In this paper, an efficient global optimization method using parallel and intelligent Bayesian optimization is proposed for the stripline design.

[88]  arXiv:1911.04338 (cross-list from cs.LG) [pdf, ps, other]
Title: Active Learning for Black-Box Adversarial Attacks in EEG-Based Brain-Computer Interfaces
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

Deep learning has made significant breakthroughs in many fields, including electroencephalogram (EEG) based brain-computer interfaces (BCIs). However, deep learning models are vulnerable to adversarial attacks, in which deliberately designed small perturbations are added to the benign input samples to fool the deep learning model and degrade its performance. This paper considers transferability-based black-box attacks, where the attacker trains a substitute model to approximate the target model, and then generates adversarial examples from the substitute model to attack the target model. Learning a good substitute model is critical to the success of these attacks, but it requires a large number of queries to the target model. We propose a novel framework which uses query synthesis based active learning to improve the query efficiency in training the substitute model. Experiments on three convolutional neural network (CNN) classifiers and three EEG datasets demonstrated that our method can improve the attack success rate with the same number of queries, or, in other words, our method requires fewer queries to achieve a desired attack performance. To our knowledge, this is the first work that integrates active learning and adversarial attacks for EEG-based BCIs.

[89]  arXiv:1911.04385 (cross-list from cs.SD) [pdf, other]
Title: Visualizing and Understanding Self-attention based Music Tagging
Comments: Machine Learning for Music Discovery Workshop (ML4MD) at ICML 2019
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Recently, we proposed a self-attention based music tagging model. Different from most of the conventional deep architectures in music information retrieval, which use stacked 3x3 filters by treating music spectrograms as images, the proposed self-attention based model attempted to regard music as a temporal sequence of individual audio events. Not only the performance, but it could also facilitate better interpretability. In this paper, we mainly focus on visualizing and understanding the proposed self-attention based music tagging model.

[90]  arXiv:1911.04386 (cross-list from stat.ML) [pdf, other]
Title: Fault Detection and Identification using Bayesian Recurrent Neural Networks
Comments: 42 pages, 23 figures. Preprint submitted to Computers & Chemical Engineering
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)

In processing and manufacturing industries, there has been a large push to produce higher quality products and ensure maximum efficiency of processes. This requires approaches to effectively detect and resolve disturbances to ensure optimal operations. While the control system can compensate for many types of disturbances, there are changes to the process which it still cannot handle adequately. It is therefore important to further develop monitoring systems to effectively detect and identify those faults such that they can be quickly resolved by operators. In this paper, a novel probabilistic fault detection and identification method is proposed which adopts a newly developed deep learning approach using Bayesian recurrent neural networks (BRNNs) with variational dropout. The BRNN model is general and can model complex nonlinear dynamics. Moreover, compared to traditional statistic-based data-driven fault detection and identification methods, the proposed BRNN-based method yields uncertainty estimates which allow for simultaneous fault detection of chemical processes, direct fault identification, and fault propagation analysis. The outstanding performance of this method is demonstrated and contrasted to (dynamic) principal component analysis, which are widely applied in the industry, in the benchmark Tennessee Eastman process (TEP) and a real chemical manufacturing dataset.

Replacements for Tue, 12 Nov 19

[91]  arXiv:1701.08070 (replaced) [pdf]
Title: An improved parametric model for hysteresis loop approximation
Comments: Preprint of research article, 35 pages, 37 figures, 1 table
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Instrumentation and Detectors (physics.ins-det)
[92]  arXiv:1803.05428 (replaced) [pdf, other]
Title: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
Comments: ICML Camera Ready Version (w/ fixed typos)
Journal-ref: ICML 2018
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[93]  arXiv:1807.05408 (replaced) [pdf, other]
Title: Non-contact Vital Signs Monitoring through Light Sensing
Subjects: Signal Processing (eess.SP)
[94]  arXiv:1810.09820 (replaced) [pdf, ps, other]
Title: Learning Optimal Scheduling Policy for Remote State Estimation under Uncertain Channel Condition
Comments: Full Version
Subjects: Systems and Control (eess.SY)
[95]  arXiv:1812.02128 (replaced) [pdf, other]
Title: A Control-Theoretic Approach for Scalable and Robust Traffic Density Estimation using Convex Optimization
Comments: IEEE Transactions on Intelligent Transportation Systems, In Press
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[96]  arXiv:1903.06380 (replaced) [pdf, ps, other]
Title: A Deep Learning Approach for Automotive Radar Interference Mitigation
Comments: Accepted in 2018 VTC workshop
Subjects: Signal Processing (eess.SP)
[97]  arXiv:1903.11593 (replaced) [pdf, other]
Title: Deep segmentation networks predict survival of non-small cell lung cancer
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[98]  arXiv:1904.01505 (replaced) [pdf, other]
Title: Structural Completeness of a Multi-channel Linear System with Dependent Parameters
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
[99]  arXiv:1905.01654 (replaced) [pdf, other]
Title: Optimal Beamforming for Hybrid Satellite Terrestrial Networks with Nonlinear PA and Imperfect CSIT
Comments: 5 pages, 5 figures, journal
Journal-ref: IEEE Wireless Communications Letters, 2019
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[100]  arXiv:1905.01898 (replaced) [pdf]
Title: Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality
Comments: Accepted by IEEE Signal Processing Letters (SPL)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101]  arXiv:1905.02458 (replaced) [pdf, other]
Title: Reachability analysis of linear hybrid systems via block decomposition
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
[102]  arXiv:1905.05916 (replaced) [pdf]
Title: Unsupervised Deep Contrast Enhancement with Power Constraint for OLED Displays
Comments: Accepted to IEEE transactions on Image Processing. To be published
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[103]  arXiv:1905.11235 (replaced) [pdf, other]
Title: CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Authors: Linhao Dong, Bo Xu
Comments: 4 pages, 3 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104]  arXiv:1906.04419 (replaced) [pdf, other]
Title: Deep learning analysis of coronary arteries in cardiac CT angiography for detection of patients requiring invasive coronary angiography
Comments: This work has been accepted to IEEE TMI for publication
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[105]  arXiv:1906.11355 (replaced) [pdf, other]
Title: Multidimensional Contrast Limited Adaptive Histogram Equalization
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM)
[106]  arXiv:1906.11710 (replaced) [pdf, other]
Title: The shocklet transform: A decomposition method for the identification of local, mechanism-driven dynamics in sociotechnical time series
Comments: 29 pages (20 body, 9 appendix), 20 figures (13 body, 7 appendix), three online appendices available at this http URL (two displaying interactive visualizations and one containing over 10,000 figures), open-source implementation of STAR algorithm and discrete shocklet transform available at this https URL
Subjects: Physics and Society (physics.soc-ph); Data Structures and Algorithms (cs.DS); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an)
[107]  arXiv:1907.08750 (replaced) [pdf, other]
Title: Double-Sided Massive MIMO Transceivers for MmWave Communications
Journal-ref: IEEE Access, Issue Date: December 2019, Volume: 7, Issue:1, Pages 157667-157679
Subjects: Signal Processing (eess.SP)
[108]  arXiv:1907.10086 (replaced) [pdf, ps, other]
Title: Ex-ante dynamic network tariffs for transmission cost recovery
Subjects: Systems and Control (eess.SY)
[109]  arXiv:1907.10634 (replaced) [pdf, other]
Title: Warp and Learn: Novel Views Generation for Vehicles and Other Objects
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[110]  arXiv:1908.03133 (replaced) [pdf, other]
Title: Demystifying the Power Scaling Law of Intelligent Reflecting Surfaces and Metasurfaces
Comments: To appear at IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2019, 5 pages, 4 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[111]  arXiv:1908.04752 (replaced) [pdf, other]
Title: Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[112]  arXiv:1908.07344 (replaced) [pdf, other]
Title: Unsupervised Multi-modal Style Transfer for Cardiac MR Segmentation
Comments: STACOM 2019 camera-ready. Winner of Multi-sequence Cardiac MR Segmentation Challenge (MS-CMRSeg 2019) this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[113]  arXiv:1908.07362 (replaced) [pdf, other]
Title: A Novel method for IDC Prediction in Breast Cancer Histopathology images using Deep Residual Neural Networks
Comments: Accepted at 2nd International Conference on Intelligent Communication and Computational Techniques,2019
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[114]  arXiv:1908.11834 (replaced) [pdf, other]
Title: Rethinking Irregular Scene Text Recognition
Comments: Technical report for participation in ICDAR2019-ArT recognition track
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[115]  arXiv:1909.02799 (replaced) [pdf, other]
Title: Deep Learning for Brain Tumor Segmentation in Radiosurgery: Prospective Clinical Evaluation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[116]  arXiv:1909.08758 (replaced) [pdf]
Title: Extracting Super-resolution Structures inside a Single Molecule or Overlapped Molecules from One Blurred Image
Subjects: Image and Video Processing (eess.IV); Human-Computer Interaction (cs.HC)
[117]  arXiv:1910.06220 (replaced) [pdf, other]
Title: Joint Active and Passive Beamforming Optimization for Intelligent Reflecting Surface Assisted SWIPT under QoS Constraints
Comments: We address the QoS-constrained beamforming optimization problem in IRS-aided SWIPT systems. More interesting works and an overview on IRS can be found at this https URL
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[118]  arXiv:1910.07672 (replaced) [pdf, other]
Title: A General Scenario Theory For Security-Constrained Unit Commitment With Probabilistic Guarantees
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[119]  arXiv:1910.09857 (replaced) [pdf, other]
Title: LSTM-based Online Learning: An Efficient EKF Based Algorithm with a Convergence Guarantee
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[120]  arXiv:1910.13290 (replaced) [pdf, other]
Title: Adaptive Causal Network Coding with Feedback for Multipath Multi-hop Communications
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[121]  arXiv:1911.00957 (replaced) [pdf, other]
Title: Learning Structure via Consensus for Face Segmentation and Parsing
Comments: 13 pages, 11 figures, technical report (improved figure resolution, updated related work)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[122]  arXiv:1911.01062 (replaced) [pdf, other]
Title: PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation
Comments: MICCAI workshop MMMI2019 Best Student Paper Award
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[123]  arXiv:1911.03317 (replaced) [pdf, other]
Title: MDP based Decision Support for Earthquake Damaged Distribution System Restoration
Subjects: Systems and Control (eess.SY)
[ total of 123 entries: 1-123 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 1911, contact, help  (Access key information)