We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 134 entries: 1-134 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 11 May 21

[1]  arXiv:2105.03461 [pdf]
Title: Impact of DER Communication Delay in AGC: Cyber-Physical Dynamic Simulation
Subjects: Systems and Control (eess.SY)

Distributed energy resource (DER) frequency regulations are promising technologies for future grid operation. Unlike conventional generators, DERs might require open communication networks to exchange signals with control centers, possibly through DER aggregators; therefore, the impacts of the communication variations on the system stability need to be investigated. This paper develops a cyber-physical dynamic simulation model based on the Hierarchical Engine for Large-Scale Co-Simulation (HELICS) to evaluate the impact of the communication variations, such as delays in DER frequency regulations. The feasible delay range can be obtained under different parameter settings. The results show that the risk of instability generally increases with the communication delay.

[2]  arXiv:2105.03509 [pdf]
Title: Wyner wiretap-like encoding scheme for cyber-physical systems
Journal-ref: IET Cyber-Physical Systems: Theory & Applications, Vol. 5, No. 4, pp. 359-365, 2020
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)

In this study, the authors consider the problem of exchanging secrete messages in cyber-physical systems (CPSs) without resorting to cryptographic solutions. In particular, they consider a CPS where the networked controller wants to send a secrete message to the plant. They show that such a problem can be solved by exploiting a Wyner wiretap-like encoding scheme taking advantage of the closed-loop operations typical of feedback control systems. Specifically, by resorting to the control concept of one-step reachable sets, they first show that a wiretap-like encoding scheme exists whenever there is an asymmetry in the plant model knowledge available to control system (the defender) and to the eavesdropper. The effectiveness of the proposed scheme is confirmed by means of a numerical example. Finally, they conclude the study by presenting open design challenges that can be addressed by the research community to improve, in different directions, the secrete message exchange problem in CPSs

[3]  arXiv:2105.03535 [pdf, other]
Title: Detection of Clouds in Multiple Wind Velocity Fields using Ground-based Infrared Sky Images
Subjects: Image and Video Processing (eess.IV)

Horizontal atmospheric wind shear causes wind velocity fields to have different directions and speeds. In images of clouds acquired using ground-based sky imaging systems, clouds may be moving in different wind layers. To increase the performance of a global solar irradiance forecasting algorithm, it is important to detect of multiple layers of clouds. The information obtained from a global solar irradiance forecasting algorithm is necessary to optimize and schedule the solar generation resources and storage devices in a smart grid. This investigation studies the performance of unsupervised learning techniques when detecting the number of cloud layers in cloud images. The images are acquired using an innovative infrared sky imaging system mounted on a solar tracker. Different mixture models are used to infer the distribution of the cloud features. The optimal number of clusters in the mixture models is decided after implementing different Bayesian metrics and comparing these with a temporal Ising model. The motion vectors are computed using a pyramidal weighted implementation of the Lucas-Kanade algorithm. The correlations between the cloud velocity vectors and temperatures are analyzed to find the method that leads to the most accurate results. We have found that the temporal Ising model outperformed the detection accuracy of the Bayesian metrics.

[4]  arXiv:2105.03542 [pdf, other]
Title: Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection
Comments: 5 pages, 3 figures, submitted to 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made up of low-capacity specialists can outperform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.

[5]  arXiv:2105.03544 [pdf, other]
Title: Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation
Authors: Sunwoo Kim, Minje Kim
Comments: 5 pages, 5 figures, under review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

In realistic speech enhancement settings for end-user devices, we often encounter only a few speakers and noise types that tend to reoccur in the specific acoustic environment. We propose a novel personalized speech enhancement method to adapt a compact denoising model to the test-time specificity. Our goal in this test-time adaptation is to utilize no clean speech target of the test speaker, thus fulfilling the requirement for zero-shot learning. To complement the lack of clean utterance, we employ the knowledge distillation framework. Instead of the missing clean utterance target, we distill the more advanced denoising results from an overly large teacher model, and use it as the pseudo target to train the small student model. This zero-shot learning procedure circumvents the process of collecting users' clean speech, a process that users are reluctant to comply due to privacy concerns and technical difficulty of recording clean voice. Experiments on various test-time conditions show that the proposed personalization method achieves significant performance gains compared to larger baseline networks trained from a large speaker- and noise-agnostic datasets. In addition, since the compact personalized models can outperform larger general-purpose models, we claim that the proposed method performs model compression with no loss of denoising performance.

[6]  arXiv:2105.03568 [pdf, other]
Title: ChaRRNets: Channel Robust Representation Networks for RF Fingerprinting
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We present complex-valued Convolutional Neural Networks (CNNs) for RF fingerprinting that go beyond translation invariance and appropriately account for the inductive bias with respect to multipath propagation channels, a phenomenon that is specific to the fields of wireless signal processing and communications. We focus on the problem of fingerprinting wireless IoT devices in-the-wild using Deep Learning (DL) techniques. Under these real-world conditions, the multipath environments represented in the train and test sets will be different. These differences are due to the physics governing the propagation of wireless signals, as well as the limitations of practical data collection campaigns. Our approach follows a group-theoretic framework, leverages prior work on DL on manifold-valued data, and extends this prior work to the wireless signal processing domain. We introduce the Lie group of transformations that a signal experiences under the multipath propagation model and define operations that are equivariant and invariant to the frequency response of a Finite Impulse Response (FIR) filter to build a ChaRRNet. We present results using synthetic and real-world datasets, and we benchmark against a strong baseline model, that show the efficacy of our approach. Our results provide evidence of the benefits of incorporating appropriate wireless domain biases into DL models. We hope to spur new work in the area of robust RF machine learning, as the 5G revolution increases demand for enhanced security mechanisms.

[7]  arXiv:2105.03583 [pdf]
Title: Domestic activities clustering from audio recordings using convolutional capsule autoencoder network
Comments: 5 pages, 2 figures, 5 tables, Accepted by IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recent efforts have been made on domestic activities classification from audio recordings, especially the works submitted to the challenge of DCASE (Detection and Classification of Acoustic Scenes and Events) since 2018. In contrast, few studies were done on domestic activities clustering, which is a newly emerging problem. Domestic activities clustering from audio recordings aims at merging audio clips which belong to the same class of domestic activity into a single cluster. Domestic activities clustering is an effective way for unsupervised estimation of daily activities performed in home environment. In this study, we propose a method for domestic activities clustering using a convolutional capsule autoencoder network (CCAN). In the method, the deep embeddings are learned by the autoencoder in the CCAN, while the deep embeddings which belong to the same class of domestic activities are merged into a single cluster by a clustering layer in the CCAN. Evaluated on a public dataset adopted in DCASE-2018 Task 5, the results show that the proposed method outperforms state-of-the-art methods in terms of the metrics of clustering accuracy and normalized mutual information.

[8]  arXiv:2105.03630 [pdf, other]
Title: A Phase Theory of MIMO LTI Systems
Subjects: Systems and Control (eess.SY)

In this paper, we introduce a definition of phase response for a class of multi-input multi-output (MIMO) linear time-invariant (LTI) systems whose frequency responses are (semi-)sectorial at all frequencies. The newly defined phase concept subsumes the well-known notions of positive real systems and negative imaginary systems. We formulate a small phase theorem for feedback stability, which complements the celebrated small gain theorem. The small phase theorem lays the foundation of a phase theory of MIMO systems. We also discuss time-domain interpretations of phase-bounded systems via both energy signal analysis and power signal analysis. In addition, a sectored real lemma is derived for the computation of MIMO phases, which serves as a natural counterpart of the bounded real lemma.

[9]  arXiv:2105.03643 [pdf, other]
Title: Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). Our prior work has shown promising results compared with hand-designed neural networks. In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to be causal cells, in order to control the total latency of the neural network. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with the hybrid CLDNN baseline, on a 10k-hour large-scale dataset. Additional 11\% relative improvements can be achieved if the latency of the neural network is relaxed to the medium latency setting.

[10]  arXiv:2105.03660 [pdf, other]
Title: Deep learning of nanopore sensing signals using a bi-path network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Biological Physics (physics.bio-ph)

Temporary changes in electrical resistance of a nanopore sensor caused by translocating target analytes are recorded as a sequence of pulses on current traces. Prevalent algorithms for feature extraction in pulse-like signals lack objectivity because empirical amplitude thresholds are user-defined to single out the pulses from the noisy background. Here, we use deep learning for feature extraction based on a bi-path network (B-Net). After training, the B-Net acquires the prototypical pulses and the ability of both pulse recognition and feature extraction without a priori assigned parameters. The B-Net performance is evaluated on generated datasets and further applied to experimental data of DNA and protein translocation. The B-Net results show remarkably small relative errors and stable trends. The B-Net is further shown capable of processing data with a signal-to-noise ratio equal to one, an impossibility for threshold-based algorithms. The developed B-Net is generic for pulse-like signals beyond pulsed nanopore currents.

[11]  arXiv:2105.03672 [pdf, other]
Title: Multi-Sensor Data Fusion for Accurate Traffic Speed and Travel Time Reconstruction
Comments: 20 pages, 9 figures, presented at the 2021 Annual Meeting of the Transportation Research Board (TRB)
Subjects: Signal Processing (eess.SP); Physics and Society (physics.soc-ph)

This paper studies the joint reconstruction of traffic speeds and travel times by fusing sparse sensor data. Raw speed data from inductive loop detectors and floating cars as well as travel time measurements are combined using different fusion techniques. A novel fusion approach is developed which extends existing speed reconstruction methods to integrate low-resolution travel time data. Several state-of-the-art methods and the novel approach are evaluated on their performance in reconstructing traffic speeds and travel times using various combinations of sensor data. Algorithms and sensor setups are evaluated with real loop detector, floating car and Bluetooth data collected during severe congestion on German freeway A9. Two main aspects are examined: (i) which algorithm provides the most accurate result depending on the used data and (ii) which type of sensor and which combination of sensors yields higher estimation accuracies. Results show that, overall, the novel approach applied to a combination of floating-car data and loop data provides the best speed and travel time accuracy. Furthermore, a fusion of sources improves the reconstruction quality in many, but not all cases. In particular, Bluetooth data only provide a benefit for reconstruction purposes if integrated distinctively.

[12]  arXiv:2105.03678 [pdf, other]
Title: Nearly Minimax-Optimal Rates for Noisy Sparse Phase Retrieval via Early-Stopped Mirror Descent
Comments: arXiv admin note: text overlap with arXiv:2010.10168
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a $k$-sparse signal $\mathbf{x}^\star\in\mathbb{R}^n$ from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hyperbolic entropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order $k^2$ (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of $\|\mathbf{x}^\star\|_2/\sqrt{k}$. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean, and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.

[13]  arXiv:2105.03679 [pdf, other]
Title: EZCrop: Energy-Zoned Channels for Robust Output Pruning
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Recent results have revealed an interesting observation in a trained convolutional neural network (CNN), namely, the rank of a feature map channel matrix remains surprisingly constant despite the input images. This has led to an effective rank-based channel pruning algorithm, yet the constant rank phenomenon remains mysterious and unexplained. This work aims at demystifying and interpreting such rank behavior from a frequency-domain perspective, which as a bonus suggests an extremely efficient Fast Fourier Transform (FFT)-based metric for measuring channel importance without explicitly computing its rank. We achieve remarkable CNN channel pruning based on this analytically sound and computationally efficient metric and adopt it for repetitive pruning to demonstrate robustness via our scheme named Energy-Zoned Channels for Robust Output Pruning (EZCrop), which shows consistently better results than other state-of-the-art channel pruning methods.

[14]  arXiv:2105.03695 [pdf, ps, other]
Title: LPVcore: MATLAB Toolbox for LPV Modelling, Identification and Control
Subjects: Systems and Control (eess.SY)

This paper describes the LPVcore software package for MATLAB developed to model, simulate, estimate and control systems via linear parameter-varying (LPV) input-output (IO), state-space (SS) and linear fractional (LFR) representations. In the LPVcore toolbox, basis affine parameter-varying matrix functions are implemented to enable users to represent LPV systems in a global setting, i.e., for time-varying scheduling trajectories. This is a key difference compared to other software suites that use a grid or only LFR-based representations. The paper contains an overview of functions in the toolbox to simulate and identify IO, SS and LFR representations. Based on various prediction-error minimization methods, a comprehensive example is given on the identification of a DC motor with an unbalanced disc, demonstrating the capabilities of the toolbox. The software and examples are available on www.lpvcore.net.

[15]  arXiv:2105.03727 [pdf]
Title: An interstellar communication method: system design and observations
Comments: 22 pages, 22 figures
Subjects: Signal Processing (eess.SP); Instrumentation and Methods for Astrophysics (astro-ph.IM)

A system of synchronized radio telescopes is utilized to search for hypothetical wide bandwidth interstellar communication signals. Transmitted signals are hypothesized to have characteristics that enable high channel capacity and minimally low energy per information bit, while containing energy-efficient signal elements that are readily discoverable, distinct from random noise. A hypothesized transmitter signal is described. Signal reception and discovery processes are detailed. Observations using individual and multiple synchronized radio telescopes, during 2017 - 2021, are described. Conclusions and further work are suggested.

[16]  arXiv:2105.03738 [pdf, ps, other]
Title: Structured Covariance Matrix Estimation with Missing-Data for Radar Applications via Expectation-Maximization
Subjects: Signal Processing (eess.SP)

Structured covariance matrix estimation in the presence of missing data is addressed in this paper with emphasis on radar signal processing applications. After a motivation of the study, the array model is specified and the problem of computing the maximum likelihood estimate of a structured covariance matrix is formulated. A general procedure to optimize the observed-data likelihood function is developed resorting to the expectation-maximization algorithm. The corresponding convergence properties are thoroughly established and the rate of convergence is analyzed. The estimation technique is contextualized for two practically relevant radar problems: beamforming and detection of the number of sources. In the former case an adaptive beamformer leveraging the EM-based estimator is presented; in the latter, detection techniques generalizing the classic Akaike information criterion, minimum description length, and Hannan-Quinn information criterion, are introduced. Numerical results are finally presented to corroborate the theoretical study.

[17]  arXiv:2105.03742 [pdf, other]
Title: ChainNet: Neural Network-Based Successive Spectral Analysis
Subjects: Signal Processing (eess.SP)

We discuss a new neural network-based direction of arrival estimation scheme that tackles the estimation task as a multidimensional classification problem. The proposed estimator uses a classification chain with as many stages as the number of sources. Each stage is a multiclass classification network that estimates the position of one of the sources. This approach can be interpreted as the approximation of a successive evaluation of the maximum a posteriori estimator. By means of simulations for fully sampled antenna arrays and systems with subarray sampling, we show that it is able to outperform existing estimation techniques in terms of accuracy, while maintaining a very low computational complexity.

[18]  arXiv:2105.03767 [pdf]
Title: Aerospace Sliding Mode Control Toolbox: Relative Degree Approach with Resource Prospector Lander and Launch Vehicle Case Studies
Authors: S. Kode, Y. Shtessel (Senior Member IEEE), A. Levant (Senior Member IEEE), J. Rakoczy, M. Hannan, J. Orr
Subjects: Systems and Control (eess.SY)

Conventional Sliding mode control and observation techniques are widely used in aerospace applications, including aircrafts, UAVs, launch vehicles, missile interceptors, and hypersonic missiles. This work is dedicated to creating a MATLAB-based sliding mode controller design and simulation software toolbox that aims to support aerospace vehicle applications. An architecture of the aerospace sliding mode control toolbox (SMC Aero) using the relative degree approach is proposed. The SMC Aero libraries include 1st order sliding mode control (1-SMC), second order sliding mode control (2-SMC), higher order sliding mode (HOSM) control (either fixed gain or adaptive), as well as higher order sliding mode differentiators. The efficacy of the SMC Aero toolbox is confirmed in two case studies: controlling and simulating resource prospector lander (RPL) soft landing on the Moon and launch vehicle (LV) attitude control in ascent mode.

[19]  arXiv:2105.03774 [pdf, ps, other]
Title: Study of List-Based OMP and an Enhanced Model for Direction Finding with Non-Uniform Arrays
Comments: 6 figures, 8 pages
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

This paper proposes an enhanced coarray transformation model (EDCTM) and a mixed greedy maximum likelihood algorithm called List-Based Maximum Likelihood Orthogonal Matching Pursuit (LBML-OMP) for direction-of-arrival estimation with non-uniform linear arrays (NLAs). The proposed EDCTM approach obtains improved estimates when Khatri-Rao product-based models are used to generate difference coarrays under the assumption of uncorrelated sources. In the proposed LBML-OMP technique, for each iteration a set of candidates is generated based on the correlation-maximization between the dictionary and the residue vector. LBML-OMP then chooses the best candidate based on a reduced-complexity asymptotic maximum likelihood decision rule. Simulations show the improved results of EDCTM over existing approaches and that LBML-OMP outperforms existing sparse recovery algorithms as well as Spatial Smoothing Multiple Signal Classification with NLAs.

[20]  arXiv:2105.03809 [pdf, other]
Title: Superresolution photoacoustic tomography using random speckle illumination and second order moments
Comments: 9 pages, 5 figures
Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

Idier et al. [IEEE Trans. Comput. Imaging 4(1), 2018] propose a method which achieves superresolution in the microscopy setting by leveraging random speckle illumination and knowledge about statistical second order moments for the illumination patterns and model noise. This is achieved without any assumptions on the sparsity of the imaged object. In this paper, we show that their technique can be extended to photoacoustic tomography. We propose a simple algorithm for doing the reconstruction which only requires a small number of linear algebra steps. It is therefore much faster than the iterative method used by Idier et al. We also propose a new representation of the imaged object based on Dirac delta expansion functions.

[21]  arXiv:2105.03828 [pdf, other]
Title: Impacts of Privately Owned Electric Vehicles on Distribution System Resilience: A Multi-agent Optimization Approach
Subjects: Systems and Control (eess.SY)

We investigate the effects of private electric vehicles (EVs) on the resilience of distribution systems after disruptions. We propose a framework of network-based multi-agent optimization problems with equilibrium constraints (N-MOPEC) to consider the decentralized decision making of stakeholders in transportation and energy systems. To solve the high-dimensional non-convex problem, we develop an efficient computational algorithm based on exact convex reformulation. Numerical studies are conducted to illustrate the effectiveness of our modeling and computational approach and to draw policy insights. The proposed modeling and computational strategies could provide a solid foundation for the future study of power system resilience with private EVs in coupled transportation and power networks.

[22]  arXiv:2105.03847 [pdf]
Title: Automatic segmentation of vertebral features on ultrasound spine images using Stacked Hourglass Network
Comments: 9 pages,5 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Objective: The spinous process angle (SPA) is one of the essential parameters to denote three-dimensional (3-D) deformity of spine. We propose an automatic segmentation method based on Stacked Hourglass Network (SHN) to detect the spinous processes (SP) on ultrasound (US) spine images and to measure the SPAs of clinical scoliotic subjects. Methods: The network was trained to detect vertebral SP and laminae as five landmarks on 1200 ultrasound transverse images and validated on 100 images. All the processed transverse images with highlighted SP and laminae were reconstructed into a 3D image volume, and the SPAs were measured on the projected coronal images. The trained network was tested on 400 images by calculating the percentage of correct keypoints (PCK); and the SPA measurements were evaluated on 50 scoliotic subjects by comparing the results from US images and radiographs. Results: The trained network achieved a high average PCK (86.8%) on the test datasets, particularly the PCK of SP detection was 90.3%. The SPAs measured from US and radiographic methods showed good correlation (r>0.85), and the mean absolute differences (MAD) between two modalities were 3.3{\deg}, which was less than the clinical acceptance error (5{\deg}). Conclusion: The vertebral features can be accurately segmented on US spine images using SHN, and the measurement results of SPA from US data was comparable to the gold standard from radiography.

[23]  arXiv:2105.03877 [pdf]
Title: Non-iterative Optimization Algorithm for Active Distribution Grids Considering Uncertainty of Feeder Parameters
Authors: J. Wu, M. Liu, W. Lu, K. Xie, M. Xie
Comments: 9 pages, 10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY)

To cope with fast-fluctuating distributed energy resources (DERs) and uncontrolled loads, this paper formulates a time-varying optimization problem for distribution grids with DERs and develops a novel non-iterative algorithm to track the optimal solutions. Different from existing methods, the proposed approach does not require iterations during the sampling interval. It only needs to perform a single one-step calculation at each interval to obtain the evolution of the optimal trajectory, which demonstrates fast calculation and online-tracking capability with an asymptotically vanishing error. Specifically, the designed approach contains two terms: a prediction term tracking the change in the optimal solution based on the time-varying nature of system power, and a correction term pushing the solution toward the optimum based on Newton's method. Moreover, the proposed algorithm can be applied in the absence of an accurate network model by leveraging voltage measurements to identify the true voltage sensitivity parameters. Simulations for an illustrative distribution network are provided to validate the approach.

[24]  arXiv:2105.03905 [pdf, other]
Title: Security Concerns on Machine Learning Solutions for 6G Networks in mmWave Beam Prediction
Comments: 13 Pages, under review. arXiv admin note: substantial text overlap with arXiv:2103.07268
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

6G -- sixth generation -- is the latest cellular technology currently under development for wireless communication systems. In recent years, machine learning algorithms have been applied widely in various fields, such as healthcare, transportation, energy, autonomous car, and many more. Those algorithms have been also using in communication technologies to improve the system performance in terms of frequency spectrum usage, latency, and security. With the rapid developments of machine learning techniques, especially deep learning, it is critical to take the security concern into account when applying the algorithms. While machine learning algorithms offer significant advantages for 6G networks, security concerns on Artificial Intelligent (AI) models is typically ignored by the scientific community so far. However, security is also a vital part of the AI algorithms, this is because the AI model itself can be poisoned by attackers. This paper proposes a mitigation method for adversarial attacks against proposed 6G machine learning models for the millimeter-wave (mmWave) beam prediction using adversarial learning. The main idea behind adversarial attacks against machine learning models is to produce faulty results by manipulating trained deep learning models for 6G applications for mmWave beam prediction. We also present the adversarial learning mitigation method's performance for 6G security in mmWave beam prediction application with fast gradient sign method attack. The mean square errors (MSE) of the defended model under attack are very close to the undefended model without attack.

[25]  arXiv:2105.03939 [pdf, other]
Title: Lightweight Image Super-Resolution with Hierarchical and Differentiable Neural Architecture Search
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Single Image Super-Resolution (SISR) tasks have achieved significant performance with deep neural networks. However, the large number of parameters in CNN-based methods for SISR tasks require heavy computations. Although several efficient SISR models have been recently proposed, most are handcrafted and thus lack flexibility. In this work, we propose a novel differentiable Neural Architecture Search (NAS) approach on both the cell-level and network-level to search for lightweight SISR models. Specifically, the cell-level search space is designed based on an information distillation mechanism, focusing on the combinations of lightweight operations and aiming to build a more lightweight and accurate SR structure. The network-level search space is designed to consider the feature connections among the cells and aims to find which information flow benefits the cell most to boost the performance. Unlike the existing Reinforcement Learning (RL) or Evolutionary Algorithm (EA) based NAS methods for SISR tasks, our search pipeline is fully differentiable, and the lightweight SISR models can be efficiently searched on both the cell-level and network-level jointly on a single GPU. Experiments show that our methods can achieve state-of-the-art performance on the benchmark datasets in terms of PSNR, SSIM, and model complexity with merely 68G Multi-Adds for $\times 2$ and 18G Multi-Adds for $\times 4$ SR tasks. Code will be available at \url{https://github.com/DawnHH/DLSR-PyTorch}.

[26]  arXiv:2105.03973 [pdf, other]
Title: Perturbation-based Frequency Domain Linear and Nonlinear Noise Estimation
Comments: 7 Pages
Subjects: Systems and Control (eess.SY)

In this paper, a new method for the separation of noise categories based on Four-Wave Mixing is presented.
The theoretical analysis is grounded in the Gaussian Noise model and verified by split step simulations. The noise categories react differently to the introduced perturbations, by performing a set of perturbations the behaviour of the different categories can be separated by means of a least-square fitting. Given ASE is independent of the induced perturbations, it is possible to separate noise contributions. The analysis includes constant and variable power perturbations.
The estimation of the noise categories is discussed from two points of view: NSR evolution post-DSP processing, and over the power spectral density in a notched region. The NSR estimation can only be performed at reception, whereas the power spectral density approach can be performed along the optical link if a high resolution Optical Spectrum Analyzer is available.
Additionally, we perform a simple experimental verification considering of two WaveLogic 3 transceivers for the NSR, successfully estimating the noise contributions.

[27]  arXiv:2105.03995 [pdf, other]
Title: Acute Lymphoblastic Leukemia Detection from Microscopic Images Using Weighted Ensemble of Convolutional Neural Networks
Comments: 31 pages, 9 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Acute Lymphoblastic Leukemia (ALL) is a blood cell cancer characterized by numerous immature lymphocytes. Even though automation in ALL prognosis is an essential aspect of cancer diagnosis, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy demands experienced pathologists to carefully read the cell images, which is arduous, time-consuming, and often suffers inter-observer variations. This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of different deep CNNs to recommend a better ALL cell classifier. The weights for the ensemble candidate models are estimated from their corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We utilize the publicly available C-NMC-2019 ALL dataset to conduct all the comprehensive experiments. Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed kappa value-based weighted ensemble yields a better result for the aimed task in this article, it can experiment in other domains of medical diagnostic applications.

[28]  arXiv:2105.04014 [pdf, other]
Title: DiagSet: a dataset for prostate cancer histopathological image classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cancer diseases constitute one of the most significant societal challenges. In this paper we introduce a novel histopathological dataset for prostate cancer detection. The proposed dataset, consisting of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnosis, and 46 scans with diagnosis given independently by a group of histopathologists, can be found at https://ai-econsilio.diag.pl. Furthermore, we propose a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis, utilizing thresholding and statistical analysis to abstain from the decision in uncertain cases. During the experimental evaluation we identify several factors negatively affecting the performance of considered models, such as presence of label noise, data imbalance, and quantity of data, that can serve as a basis for further research. The proposed approach, composed of ensembles of deep neural networks operating on the histopathological scans at different scales, achieves 94.6% accuracy in patch-level recognition, and is compared in a scan-level diagnosis with 9 human histopathologists.

[29]  arXiv:2105.04041 [pdf, ps, other]
Title: Lyapunov-Krasovskii functionals for some classes of nonlinear time delay systems
Comments: Submitted for presentation in 2021 Conference on Decision and Control (CDC)
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)

In this contribution, we study an homogeneous class of nonlinear time delay systems with time-varying perturbations. Using the Lyapunov-Krasovskii approach, we introduce a functional that leads to perturbation conditions matching those obtained previously in the Razumikhin framework. The functionals are applied to the estimation of the domain of attraction and of the system solutions. An illustrative example is given.

[30]  arXiv:2105.04077 [pdf, other]
Title: Dynamic Multichannel Access via Multi-agent Reinforcement Learning: Throughput and Fairness Guarantees
Comments: 20 pages, 12 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We consider a multichannel random access system in which each user accesses a single channel at each time slot to communicate with an access point (AP). Users arrive to the system at random and be activated for a certain period of time slots and then disappear from the system. Under such dynamic network environment, we propose a distributed multichannel access protocol based on multi-agent reinforcement learning (RL) to improve both throughput and fairness between active users. Unlike the previous approaches adjusting channel access probabilities at each time slot, the proposed RL algorithm deterministically selects a set of channel access policies for several consecutive time slots. To effectively reduce the complexity of the proposed RL algorithm, we adopt a branching dueling Q-network architecture and propose an efficient training methodology for producing proper Q-values over time-varying user sets. We perform extensive simulations on realistic traffic environments and demonstrate that the proposed online learning improves both throughput and fairness compared to the conventional RL approaches and centralized scheduling policies.

[31]  arXiv:2105.04083 [pdf, other]
Title: The Behavior of Internet Traffic for Internet Services during COVID-19 Pandemic Scenario
Comments: 4 pages, 2 figures, Submitted to XXXIX Simp\'osio Brasileiro de Telecomunica\c{c}\~oes e Processamento de Sinais, SBrT 2021, Fortaleza, CE, Brasil
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)

Since the end of 2019, the SARS-CoV-2 virus known as COVID-19 has spread rapidly around the world, forcing many governments to impose restrictive blocking or lockdown to combat the pandemic. With locomotion restriction of people in almost of countries of the world, workers and students needed to keep their activities at home. As a result, people's behavior, habits, and the way they started using the Internet changed significantly. Like professionals of offices, the younger played an important role in this behavior, especially in the type of resources used by them. As result, the characterization and traffic of communication networks were affected in some way. In this perspective article, we join from many available studies about the COVID-19 effect at networks and investigate the effects on the Internet traffic of using services such as video streaming, video conferencing, and gaming during 2020's months of the pandemic.

[32]  arXiv:2105.04106 [pdf, other]
Title: Validation of image systems simulation technology using a Cornell Box
Subjects: Image and Video Processing (eess.IV); Graphics (cs.GR)

We describe and experimentally validate an end-to-end simulation of a digital camera. The simulation models the spectral radiance of 3D-scenes, formation of the spectral irradiance by multi-element optics, and conversion of the irradiance to digital values by the image sensor. We quantify the accuracy of the simulation by comparing real and simulated images of a precisely constructed, three-dimensional high dynamic range test scene. Validated end-to-end software simulation of a digital camera can accelerate innovation by reducing many of the time-consuming and expensive steps in designing, building and evaluating image systems.

[33]  arXiv:2105.04149 [pdf, other]
Title: IRS-Assisted Active Device Detection
Subjects: Signal Processing (eess.SP)

This paper studies intelligent reflecting surface (IRS) assisted active device detection. Since the locations of the devices are a priori unknown, optimal IRS beam alignment is not possible and a worst-case design for a given coverage area is developed. To this end, we propose a generalized likelihood ratio test (GLRT) detection scheme and an IRS phase-shift design that minimizes the worst-case probability of misdetection. In addition to the proposed optimization-based phase-shift design, we consider two alternative suboptimal designs based on closed-form expressions for the IRS phase shifts. Our performance analysis establishes the superiority of the optimization-based design, especially for large coverage areas. Furthermore, we investigate the impact of scatterers on the proposed line-of-sight based design using simulations.

[34]  arXiv:2105.04163 [pdf, other]
Title: Multi-Spectrally Constrained Transceiver Design against Signal-Dependent Interference
Comments: Submitted to IEEE Transactions on Signal Processing
Subjects: Signal Processing (eess.SP)

This paper focuses on the joint synthesis of constant envelope transmit signal and receive filter aimed at optimizing radar performance in signal-dependent interference and spectrally contested-congested environments. To ensure the desired Quality of Service (QoS) at each communication system, a precise control of the interference energy injected by the radar in each licensed/shared bandwidth is imposed. Besides, along with an upper bound to the maximum transmitted energy, constant envelope (with either arbitrary or discrete phases) and similarity constraints are forced to ensure compatibility with amplifiers operating in saturation regime and bestow relevant waveform features, respectively. To handle the resulting NP-hard design problems, new iterative procedures (with ensured convergence properties) are devised to account for continuous and discrete phase constraints, capitalizing on the Coordinate Descent (CD) framework. Two heuristic procedures are also proposed to perform valuable initializations. Numerical results are provided to assess the effectiveness of the conceived algorithms in comparison with the existing methods.

[35]  arXiv:2105.04164 [pdf, other]
Title: Communication coordination in network controllability
Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)

Better understanding our ability to control an interconnected system of entities has been one of the central challenges in network science. The theories of node and edge controllability have been the main methodologies suggested to find the minimal set of nodes enabling control over the whole system's dynamics. While the focus is traditionally mostly on physical systems, there has been an increasing interest in control questions involving socioeconomic systems. However, surprisingly little attention has been given to the methods' underlying assumptions on control propagation, or communication assumptions, a crucial aspect in social contexts. In this paper, we show that node controllability contains a single message assumption, allowing no heterogeneity in communication to neighbouring nodes in a network. Edge controllability is shown to relax this communication assumption but aims to control the dynamics of the edge states and not the node states, thus answering a fundamentally different question. This makes comparisons of the results from the two methods nonsensical. To increase the applicability of controllability methodology to socioeconomic contexts, we provide guiding principles to choose the appropriate methodology and suggest new avenues for future theoretical work to encode more realistic communication assumptions.

[36]  arXiv:2105.04193 [pdf]
Title: Modelling of LIDAR sensor disturbances by solid airborne particles
Journal-ref: SIA SIMULATION NUMERIQUES, Apr 2021, Digital event, France
Subjects: Signal Processing (eess.SP)

This paper aims to introduce a method for simulating with a real time performance the automotive LIDAR disturbance by dust clouds caused by natural phenomena, mechanical or man-made processes like a traveling vehicle. In this study, we are interested to study the interaction of an automotive LIDAR sensor with a dust cloud composed of solid particles. The main objective of this study is to provide a simulation model to industry and research laboratories that help to study LIDAR performance in a dust-sand environment with the capability to reproduce the encountered problems in degraded conditions and the ability to parameterize the degradation model. Based on industrial projects with a passenger's vehicles and truck manufacturers, we present LIDAR sensor and functionalities to perceive objects in a scene (pedestrian, car, truck, ...) in clear or extreme weather conditions. Simulated and experimental data are compared and analyzed in this article. The features presented are evaluated according to their quality for object detection. This study can be applied to sensors post-processing algorithms (object recognition, tracking, data fusion...) and even to the design of cleaning systems.

[37]  arXiv:2105.04196 [pdf, other]
Title: AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). Due to the challenges of dynamic channel conditions, centralized resource management schemes that require global information are inefficient and lead to large signaling overheads. Hence, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Existing MARL algorithms consider a holistic reward function for the group's collective success, which often ends up with unsatisfactory results and cannot guarantee an optimal policy for each agent. Consequently, motivated by the existing literature in RL, we propose a novel MARL framework that trains two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, the individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithm's effectiveness compared with the conventional RL methods applied in this area.

[38]  arXiv:2105.04207 [pdf, other]
Title: Age of Information Aware VNF Scheduling in Industrial IoT Using Deep Reinforcement Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and scheduling in these schemes is NP-hard and finding a globally optimal solution by traditional approaches is complex. Recently, deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions and jointly minimize VNF cost and AoI in terms of network resources under end-to end Quality of Service constraints. To surmount the single-agent capacity limitation for learning, we then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other. Simulation results demonstrate that single-agent schemes significantly outperform the greedy algorithm in terms of average network cost and AoI. Moreover, multi-agent solution decreases the average cost by dividing the tasks between the agents. However, it needs more iterations to be learned due to the requirement on the agents collaboration.

[39]  arXiv:2105.04269 [pdf, other]
Title: Weakly supervised pan-cancer segmentation tool
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The vast majority of semantic segmentation approaches rely on pixel-level annotations that are tedious and time consuming to obtain and suffer from significant inter and intra-expert variability. To address these issues, recent approaches have leveraged categorical annotations at the slide-level, that in general suffer from robustness and generalization. In this paper, we propose a novel weakly supervised multi-instance learning approach that deciphers quantitative slide-level annotations which are fast to obtain and regularly present in clinical routine. The extreme potentials of the proposed approach are demonstrated for tumor segmentation of solid cancer subtypes. The proposed approach achieves superior performance in out-of-distribution, out-of-location, and out-of-domain testing sets.

[40]  arXiv:2105.04310 [pdf, other]
Title: Study on the temporal pooling used in deep neural networks for speaker verification
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

The x-vector architecture has recently achieved state-of-the-art results on the speaker verification task. This architecture incorporates a central layer, referred to as temporal pooling, which stacks statistical parameters of the acoustic frame distribution. This work proposes to highlight the significant effect of the temporal pooling content on the training dynamics and task performance. An evaluation with different pooling layers is conducted, that is, including different statistical measures of central tendency. Notably, 3rd and 4th moment-based statistics (skewness and kurtosis) are also tested to complete the usual mean and standard-deviation parameters. Our experiments show the influence of the pooling layer content in terms of speaker verification performance, but also for several classification tasks (speaker, channel or text related), and allow to better reveal the presence of external information to the speaker identity depending on the layer content.

[41]  arXiv:2105.04324 [pdf, other]
Title: Passivity-based control of mechanical systems with linear damping identification
Comments: Submission for 7th IFAC Workshop on Lagrangian and Hamiltonian Methods for Nonlinear Control
Subjects: Systems and Control (eess.SY)

We propose a control approach for a class of nonlinear mechanical systems to stabilize the system under study while ensuring that the oscillations of the transient response are reduced. The approach is twofold: (i) we apply our technique for linear viscous damping identification of the system to improve the accuracy of the selected control technique, and (ii) we implement a passivity-based controller to stabilize and reduce the oscillations by selecting the control parameters properly in accordance with the identified damping. Moreover, we provide an analysis for a particular passivity-based control approach that has been shown successfully for reducing such oscillations. Also, we validate the methodology by implementing it experimentally in a planar manipulator.

[42]  arXiv:2105.04335 [pdf, other]
Title: Geometrical Characterization of Sensor Placement for Cone-Invariant and Multi-Agent Systems against Undetectable Zero-Dynamics Attacks
Comments: 8 figures
Subjects: Systems and Control (eess.SY)

Undetectable attacks are an important class of malicious attacks threatening the security of cyber-physical systems, which can modify a system's state but leave the system output measurements unaffected, and hence cannot be detected from the output. This paper studies undetectable attacks on cone-invariant systems and multi-agent systems. We first provide a general characterization of zero-dynamics attacks, which characterizes fully undetectable attacks targeting the non-minimum phase zeros of a system. This geometrical characterization makes it possible to develop a defense strategy seeking to place a minimal number of sensors to detect and counter the zero-dynamics attacks on the system's actuators. The detect and defense scheme amounts to computing a set containing potentially vulnerable actuator locations and nodes, and a defense union for feasible placement of sensors based on the geometrical properties of the cones under consideration.

[43]  arXiv:2105.04340 [pdf]
Title: Interaction Theory of Hazard-Target System
Comments: 28 pages, 9 figures, 3 tables
Subjects: Systems and Control (eess.SY)

Major accidents (e.g., the Space Shuttle Challenger disaster in the USA, the Bhopal Disaster in India, Fukushima nuclear accident in Japan, Tianjin Port fire and explosion accident in China) have occurred all over the world. Safety scientists are always trying to understand why these accidents happened and how to prevent these accidents. Accident models and theories form the basis for many safety research fields and practices such as investigation of accidents, design of a safer system and decision making on safety related field. There is no universally accepted model with useful elements relating to understanding accident causation, although many accident causation models exist. Based on STAMP and RMF, we proposed a new theory named the Interaction Theory of Hazard-Target System (ITHTS) that incorporate human, organisational and technological characteristics in the same framework. Accident analysis methods provide the necessary information to analysis the accident in a specific setting. In order to solve the issues that current accident analysis methods still face, we proposed a new systemic accident analysis method based on ITHTS and STPA. We choose Tianjin Port fire and explosion accident in China as a case study to demonstrate the viability of the Interaction Theory of Hazard-target System and the applicability of the new accident analysis method. It is concluded that ITHTS can explain the phenomena in safety practice and the new accident analysis method can be application in the explanation and analysis of major accident.

[44]  arXiv:2105.04356 [pdf, other]
Title: Coconut trees detection and segmentation in aerial imagery using mask region-based convolution neural network
Comments: Published in IET Computer Vision, 09 April 2021
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Food resources face severe damages under extraordinary situations of catastrophes such as earthquakes, cyclones, and tsunamis. Under such scenarios, speedy assessment of food resources from agricultural land is critical as it supports aid activity in the disaster hit areas. In this article, a deep learning approach is presented for the detection and segmentation of coconut tress in aerial imagery provided through the AI competition organized by the World Bank in collaboration with OpenAerialMap and WeRobotics. Maked Region-based Convolutional Neural Network approach was used identification and segmentation of coconut trees. For the segmentation task, Mask R-CNN model with ResNet50 and ResNet1010 based architectures was used. Several experiments with different configuration parameters were performed and the best configuration for the detection of coconut trees with more than 90% confidence factor was reported. For the purpose of evaluation, Microsoft COCO dataset evaluation metric namely mean average precision (mAP) was used. An overall 91% mean average precision for coconut trees detection was achieved.

[45]  arXiv:2105.04492 [pdf, other]
Title: Practical Fingerprinting of RF Devices in the Wild
Comments: arXiv admin note: text overlap with arXiv:2104.00751
Subjects: Signal Processing (eess.SP)

We present a new RF fingerprinting technique for wireless emitters that is based on a simple, easily and efficiently retrainable Ridge Regression (RR) classifier. The RR learns to identify devices using bursts of waveform samples, conveniently transformed and preprocessed by delay-loop reservoirs. Deep delay Loop Reservoir Computing (DLR) is our processing architecture that supports general machine learning algorithms on resource-constrained devices by leveraging delay-loop reservoir computing (RC) and innovative architectures of loop trees. In prior work, we trained and evaluated DLR using high SNR device emissions in clean channels. We here demonstrate how to use DLR for IoT authentication by performing RF-based Specific Emitter Identification (SEI), even in the presence of fading channels and heavy in-band jamming by leveraging a matched filter (MF) extension, dubbed MF-DLR. We show that the MF processing improves the SEI performance of RR without the RC transformation (MF-RR), but the MF-DLR is more robust and applicable for addressing signatures beyond waveform transients (e.g. turn-on).

[46]  arXiv:2105.04529 [pdf, other]
Title: Identification of the nonlinear steering dynamics of an autonomous vehicle
Comments: Accepted to SYSID 2021 (revised with reviewer feedback)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Automated driving applications require accurate vehicle specific models to precisely predict and control the motion dynamics. However, modern vehicles have a wide array of digital and mechatronic components that are difficult to model, manufactures do not disclose all details required for modelling and even existing models of subcomponents require coefficient estimation to match the specific characteristics of each vehicle and their change over time. Hence, it is attractive to use data-driven modelling to capture the relevant vehicle dynamics and synthesise model-based control solutions. In this paper, we address identification of the steering system of an autonomous car based on measured data. We show that the underlying dynamics are highly nonlinear and challenging to be captured, necessitating the use of data-driven methods that fuse the approximation capabilities of learning and the efficiency of dynamic system identification. We demonstrate that such a neural network based subspace-encoder method can successfully capture the underlying dynamics while other methods fall short to provide reliable results.

[47]  arXiv:2105.04532 [pdf, other]
Title: Improved Simultaneous Multi-Slice Functional MRI Using Self-supervised Deep Learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP); Medical Physics (physics.med-ph)

Functional MRI (fMRI) is commonly used for interpreting neural activities across the brain. Numerous accelerated fMRI techniques aim to provide improved spatiotemporal resolutions. Among these, simultaneous multi-slice (SMS) imaging has emerged as a powerful strategy, becoming a part of large-scale studies, such as the Human Connectome Project. However, when SMS imaging is combined with in-plane acceleration for higher acceleration rates, conventional SMS reconstruction methods may suffer from noise amplification and other artifacts. Recently, deep learning (DL) techniques have gained interest for improving MRI reconstruction. However, these methods are typically trained in a supervised manner that necessitates fully-sampled reference data, which is not feasible in highly-accelerated fMRI acquisitions. Self-supervised learning that does not require fully-sampled data has recently been proposed and has shown similar performance to supervised learning. However, it has only been applied for in-plane acceleration. Furthermore the effect of DL reconstruction on subsequent fMRI analysis remains unclear. In this work, we extend self-supervised DL reconstruction to SMS imaging. Our results on prospectively 10-fold accelerated 7T fMRI data show that self-supervised DL reduces reconstruction noise and suppresses residual artifacts. Subsequent fMRI analysis remains unaltered by DL processing, while the improved temporal signal-to-noise ratio produces higher coherence estimates between task runs.

Cross-lists for Tue, 11 May 21

[48]  arXiv:1811.12759 (cross-list from math.OC) [pdf, other]
Title: A Decentralized Event-Based Approach for Robust Model Predictive Control
Comments: 18 pages, 3 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper, we propose an event-based sampling policy to implement a constraint-tightening, robust MPC method. The proposed policy enjoys a computationally tractable design and is applicable to perturbed, linear time-invariant systems with polytopic constraints. In particular, the triggering mechanism is suitable for plants with no centralized sensory node as the triggering mechanism can be evaluated locally at each individual sensor. From a geometrical viewpoint, the mechanism is a sequence of hyper-rectangles surrounding the optimal state trajectory such that robust recursive feasibility and robust stability are guaranteed. The design of the triggering mechanism is cast as a constrained parametric-in-set optimization problem with the volume of the set as the objective function. Re-parameterized in terms of the set vertices, we show that the problem admits a finite tractable convex program reformulation and a linear program relaxation. Several numerical examples are presented to demonstrate the effectiveness and limitations of the theoretical results.

[49]  arXiv:2008.10362 (cross-list from math.OC) [pdf, other]
Title: Fast Approximate Dynamic Programming for Input-Affine Dynamics
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We propose two novel numerical schemes for approximate implementation of the Dynamic Programming (DP) operation concerned with finite-horizon optimal control of discrete-time, stochastic systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces, and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide error bounds for the proposed algorithms, along with a detailed analyses of their computational complexity. In particular, for a specific class of problems with separable data in the state and input variables, the proposed approach can reduce the typical time complexity of the DP operation from O(XU) to O(X+U) where X and U denote the size of the discrete state and input spaces, respectively. In a broader perspective, the key contribution here can be viewed as an algorithmic transformation of the minimization in DP operation to addition via discrete conjugation. This bridge enables us to utilize any complexity reduction on the discrete conjugation front within the proposed algorithms. In particular, motivated by the recent development of quantum algorithms for computing the discrete conjugate transform, we discuss the possibility of a quantum mechanical implementation of the proposed algorithms.

[50]  arXiv:2102.08880 (cross-list from math.OC) [pdf, other]
Title: Fast Approximate Dynamic Programming for Infinite-Horizon Continuous-State Markov Decision Processes
Comments: 17 pages, 1 figure. arXiv admin note: text overlap with arXiv:2008.10362
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this article, we consider the infinite-horizon, discounted cost, optimal control of discrete-time systems with separable cost and constraint in the state and input variables. Starting from deterministic linear dynamics, we introduce a novel numerical algorithm for implementation of the value iteration (VI) algorithm in the conjugate domain, using the Linear-time Legendre Transform algorithm. Detailed analyses of the convergence, complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach can reduce the time complexity of each iteration of the VI algorithm from $O(XU)$ to $O(X)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain. Also discussed are the direct extensions of the proposed algorithm for nonlinear dynamics and stochastic dynamics with additive noise.

[51]  arXiv:2105.03579 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Remote Sensing Super-Resolution via Migration Image Prior
Comments: 6 pages, 4 figures. IEEE International Conference on Multimedia and Expo (ICME) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recently, satellites with high temporal resolution have fostered wide attention in various practical applications. Due to limitations of bandwidth and hardware cost, however, the spatial resolution of such satellites is considerably low, largely limiting their potentials in scenarios that require spatially explicit information. To improve image resolution, numerous approaches based on training low-high resolution pairs have been proposed to address the super-resolution (SR) task. Despite their success, however, low/high spatial resolution pairs are usually difficult to obtain in satellites with a high temporal resolution, making such approaches in SR impractical to use. In this paper, we proposed a new unsupervised learning framework, called "MIP", which achieves SR tasks without low/high resolution image pairs. First, random noise maps are fed into a designed generative adversarial network (GAN) for reconstruction. Then, the proposed method converts the reference image to latent space as the migration image prior. Finally, we update the input noise via an implicit method, and further transfer the texture and structured information from the reference image. Extensive experimental results on the Draper dataset show that MIP achieves significant improvements over state-of-the-art methods both quantitatively and qualitatively. The proposed MIP is open-sourced at this http URL

[52]  arXiv:2105.03589 (cross-list from cs.IT) [pdf, ps, other]
Title: Relay Assisted Underlay Cognitive Radio Networks with Multiple Users
Comments: 7 pages, 4 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we consider an underlay cognitive radio network assisted by dual-hop decode-and-forward (DF) relaying. For a general multi-user network, we adopt a max-min fairness relay selection scheme and analyse the outage probability when the channels are subject to independent and non-identical Nakagami-m fading. The relay network operates within the constraint imposed on the peak interference power tolerable by the primary receiver. We then analyse the asymptotic outage probability performance and illustrate the existence of i) the full-diversity order when the interference level at the primary user increases proportionally with the relay transmit power; and ii) an outage floor when the transmit powers of the relays are restricted by the primary receiver. We also analyse the outage probability with imperfect channel state information (CSI) and the average throughput over Rayleigh fading channels. Illustrative analytical results are accurately validated by numerical simulations.

[53]  arXiv:2105.03607 (cross-list from math.DS) [pdf, other]
Title: Mean Subtraction and Mode Selection in Dynamic Mode Decomposition
Comments: 43 pages, 7 figures
Subjects: Dynamical Systems (math.DS); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an)

Koopman mode analysis has provided a framework for analysis of nonlinear phenomena across a plethora of fields. Its numerical implementation via Dynamic Mode Decomposition (DMD) has been extensively deployed and improved upon over the last decade. We address the problems of mean subtraction and DMD mode selection in the context of finite dimensional Koopman invariant subspaces.
Preprocessing of data by subtraction of the temporal mean of a time series has been a point of contention in companion matrix-based DMD. This stems from the potential of said preprocessing to render DMD equivalent to temporal DFT. We prove that this equivalence is impossible when the order of the DMD-based representation of the dynamics exceeds the dimension of the system. Moreover, this parity of DMD and DFT is mostly indicative of an inadequacy of data, in the sense that the number of snapshots taken is not enough to represent the true dynamics of the system.
We then vindicate the practice of pruning DMD eigenvalues based on the norm of the respective modes. Once a minimum number of time delays has been taken, DMD eigenvalues corresponding to DMD modes with low norm are shown to be spurious, and hence must be discarded. When dealing with mean-subtracted data, the above criterion for detecting synthetic eigenvalues can be applied after additional pre-processing. This takes the form of an eigenvalue constraint on Companion DMD, or yet another time delay.

[54]  arXiv:2105.03642 (cross-list from cs.IT) [pdf, other]
Title: MIMO Terahertz Quantum Key Distribution
Comments: Submitted to IEEE Communications Letters
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Signal Processing (eess.SP); Quantum Physics (quant-ph)

We propose a multiple-input multiple-output (MIMO) quantum key distribution (QKD) scheme for improving the secret key rates and increasing the maximum transmission distance for terahertz (THz) frequency range applications operating at room temperature. We propose a transmit beamforming and receive combining scheme that converts the rank-$r$ MIMO channel between Alice and Bob into $r$ parallel lossy quantum channels whose transmittances depend on the non-zero singular values of the MIMO channel. The MIMO transmission scheme provides a multiplexing gain of $r$, along with a beamforming and array gain equal to the product of the number of transmit and receive antennas. This improves the secret key rate and extends the maximum transmission distance. Our simulation results show that multiple antennas are necessary to overcome the high free-space path loss at THz frequencies. Positive key rates are achievable in the $10-30$ THz frequency range that can be used for both indoor and outdoor QKD applications for beyond fifth generation ultra-secure wireless communications systems.

[55]  arXiv:2105.03647 (cross-list from cs.CV) [pdf, other]
Title: A Novel Triplet Sampling Method for Multi-Label Remote Sensing Image Search and Retrieval
Comments: The paper is under review. Our code is available online at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Learning the similarity between remote sensing (RS) images forms the foundation for content based RS image retrieval (CBIR). Recently, deep metric learning approaches that map the semantic similarity of images into an embedding space have been found very popular in RS. A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images to a reference image called as an anchor. Choosing triplets is a difficult task particularly for multi-label RS CBIR, where each training image is annotated by multiple class labels. To address this problem, in this paper we propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multi-label RS CBIR problems. The proposed method selects a small set of the most representative and informative triplets based on two main steps. In the first step, a set of anchors that are diverse to each other in the embedding space is selected from the current mini-batch using an iterative algorithm. In the second step, different sets of positive and negative images are chosen for each anchor by evaluating relevancy, hardness, and diversity of the images among each other based on a novel ranking strategy. Experimental results obtained on two multi-label benchmark achieves show that the selection of the most informative and representative triplets in the context of DNNs results in: i) reducing the computational complexity of the training phase of the DNNs without any significant loss on the performance; and ii) an increase in learning speed since informative triplets allow fast convergence. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/image-retrieval-from-triplets.

[56]  arXiv:2105.03716 (cross-list from cs.CL) [pdf, ps, other]
Title: Continuous representations of intents for dialogue systems
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Intent modelling has become an important part of modern dialogue systems. With the rapid expansion of practical dialogue systems and virtual assistants, such as Amazon Alexa, Apple Siri, and Google Assistant, the interest has only increased. However, up until recently the focus has been on detecting a fixed, discrete, number of seen intents. Recent years have seen some work done on unseen intent detection in the context of zero-shot learning. This paper continues the prior work by proposing a novel model where intents are continuous points placed in a specialist Intent Space that yields several advantages. First, the continuous representation enables to investigate relationships between the seen intents. Second, it allows any unseen intent to be reliably represented given limited quantities of data. Finally, this paper will show how the proposed model can be augmented with unseen intents without retraining any of the seen ones. Experiments show that the model can reliably add unseen intents with a high accuracy while retaining a high performance on the seen intents.

[57]  arXiv:2105.03838 (cross-list from cs.LG) [pdf, other]
Title: HyperHyperNetworks for the Design of Antenna Arrays
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

We present deep learning methods for the design of arrays and single instances of small antennas. Each design instance is conditioned on a target radiation pattern and is required to conform to specific spatial dimensions and to include, as part of its metallic structure, a set of predetermined locations. The solution, in the case of a single antenna, is based on a composite neural network that combines a simulation network, a hypernetwork, and a refinement network. In the design of the antenna array, we add an additional design level and employ a hypernetwork within a hypernetwork. The learning objective is based on measuring the similarity of the obtained radiation pattern to the desired one. Our experiments demonstrate that our approach is able to design novel antennas and antenna arrays that are compliant with the design requirements, considerably better than the baseline methods. We compare the solutions obtained by our method to existing designs and demonstrate a high level of overlap. When designing the antenna array of a cellular phone, the obtained solution displays improved properties over the existing one.

[58]  arXiv:2105.03842 (cross-list from cs.CL) [pdf, other]
Title: FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.

[59]  arXiv:2105.03924 (cross-list from math.OC) [pdf, other]
Title: Computationally Efficient Dynamic Traffic Optimization Of Railway Systems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper we investigate real-time, dynamic traffic optimization in railway systems. In order to enable practical solution times, we operate the optimizer in a receding horizon fashion and with optimization horizons that are shorter than the full path to destinations, using a model predictive control (MPC) approach. We present new procedures to establish safe prediction horizons, providing formal guarantees that the system is operated in a way that satisfies hard safety constraints despite the fact that not all future train interactions are taken into account, by characterizing the minimal required optimization horizons. We also show that any feasible solution to our proposed models is sufficient to maintain a safe, automated operation of the railway system, providing an upper bound on the computations strictly required. Additionally, we show that these minimal optimization horizons also characterize an upper bound on computations required to construct a feasible solution for any arbitrary optimization horizon, paving the way for anytime algorithms. Finally, our results enable systematic solution reuse, when previous schedules are available. We test our approach on a detailed simulation environment of a real-world railway system used for freight transport.

[60]  arXiv:2105.04040 (cross-list from cs.CV) [pdf, other]
Title: Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Convolutional neural networks lack shift equivariance due to the presence of downsampling layers. In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant. However, in networks used for image reconstruction tasks, it can not by itself restore shift equivariance. We address this problem by proposing adaptive polyphase upsampling (APS-U), a non-linear extension of conventional upsampling, which allows CNNs to exhibit perfect shift equivariance. With MRI and CT reconstruction experiments, we show that networks containing APS-D/U layers exhibit state of the art equivariance performance without sacrificing on image reconstruction quality. In addition, unlike prior methods like data augmentation and anti-aliasing, the gains in equivariance obtained from APS-D/U also extend to images outside the training distribution.

[61]  arXiv:2105.04065 (cross-list from cs.SD) [pdf, other]
Title: Voice activity detection in the wild: A data-driven approach using teacher-student training
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1542-1555, 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Voice activity detection is an essential pre-processing component for speech-related tasks such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain frame-level labels from an ASR pipeline by using, e.g., a Hidden Markov model. These ASR models are commonly trained on clean and fully transcribed data, limiting VAD systems to be trained on clean or synthetically noised datasets. Therefore, a major challenge for supervised VAD systems is their generalization towards noisy, real-world data. This work proposes a data-driven teacher-student approach for VAD, which utilizes vast and unconstrained audio data for training. Unlike previous approaches, only weak labels during teacher training are required, enabling the utilization of any real-world, potentially noisy dataset. Our approach firstly trains a teacher model on a source dataset (Audioset) using clip-level supervision. After training, the teacher provides frame-level guidance to a student model on an unlabeled, target dataset. A multitude of student models trained on mid- to large-sized datasets are investigated (Audioset, Voxceleb, NIST SRE). Our approach is then respectively evaluated on clean, artificially noised, and real-world data. We observe significant performance gains in artificially noised and real-world scenarios. Lastly, we compare our approach against other unsupervised and supervised VAD methods, demonstrating our method's superiority.

[62]  arXiv:2105.04075 (cross-list from cs.CV) [pdf]
Title: CFPNet-M: A Light-Weight Encoder-Decoder Based Network for Multimodal Biomedical Image Real-Time Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Currently, developments of deep learning techniques are providing instrumental to identify, classify, and quantify patterns in medical images. Segmentation is one of the important applications in medical image analysis. In this regard, U-Net is the predominant approach to medical image segmentation tasks. However, we found that those U-Net based models have limitations in several aspects, for example, millions of parameters in the U-Net consuming considerable computation resource and memory, lack of global information, and missing some tough objects. Therefore, we applied two modifications to improve the U-Net model: 1) designed and added the dilated channel-wise CNN module, 2) simplified the U shape network. Based on these two modifications, we proposed a novel light-weight architecture -- Channel-wise Feature Pyramid Network for Medicine (CFPNet-M). To evaluate our method, we selected five datasets with different modalities: thermography, electron microscopy, endoscopy, dermoscopy, and digital retinal images. And we compared its performance with several models having different parameter scales. This paper also involves our previous studies of DC-UNet and some commonly used light-weight neural networks. We applied the Tanimoto similarity instead of the Jaccard index for gray-level image measurements. By comparison, CFPNet-M achieves comparable segmentation results on all five medical datasets with only 0.65 million parameters, which is about 2% of U-Net, and 8.8 MB memory. Meanwhile, the inference speed can reach 80 FPS on a single RTX 2070Ti GPU with the 256 by 192 pixels input size.

[63]  arXiv:2105.04079 (cross-list from cs.SD) [pdf, ps, other]
Title: Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method
Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.

[64]  arXiv:2105.04090 (cross-list from cs.SD) [pdf, other]
Title: MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE
Comments: Preprint. 26 pages, 7 figures, and 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Transformers and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation. While the former boast an impressive capability in modeling long sequences, the latter allow users to willingly exert control over different parts (e.g., bars) of the music to be generated. In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths. The task is split into two steps. First, we equip Transformer decoders with the ability to accept segment-level, time-varying conditions during sequence generation. Subsequently, we combine the developed and tested in-attention decoder with a Transformer encoder, and train the resulting MuseMorphose model with the VAE objective to achieve style transfer of long musical pieces, in which users can specify musical attributes including rhythmic intensity and polyphony (i.e., harmonic fullness) they desire, down to the bar level. Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.

[65]  arXiv:2105.04091 (cross-list from cs.IT) [pdf, ps, other]
Title: Diversity Analysis of Millimeter-Wave OFDM Massive MIMO Systems
Comments: 12 pages, 4 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We analyze the diversity gain for a distributed antenna subarray employing orthogonal frequency-division multiplexing (OFDM) in millimeter-wave (mm-Wave) massive multiple-input multiple-output (MIMO) systems. We show that the diversity gain depends on the number of transmitted data streams, the number of remote antenna units, and the number of propagation paths between RAUs. Furthermore, we show that by using bit-interleaved coded multiple beamforming (BICMB), one can achieve the maximum diversity gain in a distributed antenna subarray system. The assumption in both scenarios is that the number of the antennas at the transmitter and the receiver are large enough and channel state information (CSI) is known at the transmitter and the receiver.

[66]  arXiv:2105.04107 (cross-list from cs.IT) [pdf, other]
Title: MmWave MIMO Communication with Semi-Passive RIS: A Low-Complexity Channel Estimation Scheme
Comments: 6 pages, 3 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Reconfigurable intelligent surfaces (RISs) have recently received widespread attention in the field of wireless communication. An RIS can be controlled to reflect incident waves from the transmitter towards the receiver; a feature that is believed to fundamentally contribute to beyond 5G wireless technology. The typical RIS consists of entirely passive elements, which requires the high-dimensional channel estimation to be done elsewhere. Therefore, in this paper, we present a semi-passive large-scale RIS architecture equipped with only a small fraction of simplified receiver units with only 1-bit quantization. Based on this architecture, we first propose an alternating direction method of multipliers (ADMM)-based approach to recover the training signals at the passive RIS elements, We then obtain the global channel by combining a channel sparsification step with the generalized approximate message passing (GAMP) algorithm. Our proposed scheme exploits both the sparsity and low-rankness properties of the channel in the joint spatial-frequency domain of a wideband mmWave multiple-input-multiple-output (MIMO) communication system. Simulation results show that the proposed algorithm can significantly reduce the pilot signaling needed for accurate channel estimation and outperform previous methods, even with fewer receiver units.

[67]  arXiv:2105.04124 (cross-list from cs.SD) [pdf, other]
Title: MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Text-to-Speech (TTS) synthesis plays an important role in human-computer interaction. Currently, most TTS technologies focus on the naturalness of speech, namely,making the speeches sound like humans. However, the key tasks of the expression of emotion and the speaker identity are ignored, which limits the application scenarios of TTS synthesis technology. To make the synthesized speech more realistic and expand the application scenarios, we propose a multi-task anthropomorphic speech synthesis framework (MASS), which can synthesize speeches from text with specified emotion and speaker identity. The MASS framework consists of a base TTS module and two novel voice conversion modules: the emotional voice conversion module and the speaker voice conversion module. We propose deep emotion voice conversion model (DEVC) and deep speaker voice conversion model (DSVC) based on convolution residual networks. It solves the problem of feature loss during voice conversion. The model trainings are independent of parallel datasets, and are capable of many-to-many voice conversion. In the emotional voice conversion, speaker voice conversion experiments, as well as the multi-task speech synthesis experiments, experimental results show DEVC and DSVC convert speech effectively. The quantitative and qualitative evaluation results of multi-task speech synthesis experiments show MASS can effectively synthesis speech with specified text, emotion and speaker identity.

[68]  arXiv:2105.04138 (cross-list from cs.IT) [pdf, ps, other]
Title: Near Interference-Free Space-Time User Scheduling for MmWave Cellular Network
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

The highly directional beams applied in millimeter wave (mmWave) cellular networks make it possible to achieve near interference-free (NIF) transmission under judiciously designed space-time user scheduling, where the power of intra-/inter-cell interference between any two users is below a predefined threshold. In this paper, we investigate two aspects of the NIF space-time user scheduling in a multi-cell mmWave network with multi-RF-chain base stations. Firstly, given that each user has a requirement on the number of space-time resource elements, we study the NIF user scheduling problem to minimize the unfulfilled user requirements, so that the space-time resources can be utilized most efficiently and meanwhile all strong interferences are avoided. A near-optimal scheduling algorithm is proposed with performance close to the lower bound of unfulfilled requirements. Furthermore, we study the joint NIF user scheduling and power allocation problem to minimize the power consumption under the constraint of rate requirements. Based on our proposed NIF scheduling, an energy-efficient joint scheduling and power allocation scheme is designed with limited channel state information, which outperforms the existing independent set based schemes, and has near-optimal performance as well.

[69]  arXiv:2105.04194 (cross-list from cs.IT) [pdf, other]
Title: The Modulo Radon Transform: Theory, Algorithms and Applications
Comments: 32 pages, submitted for possible publication
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Recently, experiments have been reported where researchers were able to perform high dynamic range (HDR) tomography in a heuristic fashion, by fusing multiple tomographic projections. This approach to HDR tomography has been inspired by HDR photography and inherits the same disadvantages. Taking a computational imaging approach to the HDR tomography problem, we here suggest a new model based on the Modulo Radon Transform (MRT), which we rigorously introduce and analyze. By harnessing a joint design between hardware and algorithms, we present a single-shot HDR tomography approach, which to our knowledge, is the only approach that is backed by mathematical guarantees.
On the hardware front, instead of recording the Radon Transform projections that my potentially saturate, we propose to measure modulo values of the same. This ensures that the HDR measurements are folded into a lower dynamic range. On the algorithmic front, our recovery algorithms reconstruct the HDR images from folded measurements. Beyond mathematical aspects such as injectivity and inversion of the MRT for different scenarios including band-limited and approximately compactly supported images, we also provide a first proof-of-concept demonstration. To do so, we implement MRT by experimentally folding tomographic measurements available as an open source data set using our custom designed modulo hardware. Our reconstruction clearly shows the advantages of our approach for experimental data. In this way, our MRT based solution paves a path for HDR acquisition in a number of related imaging problems.

[70]  arXiv:2105.04210 (cross-list from cs.LG) [pdf, other]
Title: Robust Graph Learning Under Wasserstein Uncertainty
Comments: 21 pages,9 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Graphs are playing a crucial role in different fields since they are powerful tools to unveil intrinsic relationships among signals. In many scenarios, an accurate graph structure representing signals is not available at all and that motivates people to learn a reliable graph structure directly from observed signals. However, in real life, it is inevitable that there exists uncertainty in the observed signals due to noise measurements or limited observability, which causes a reduction in reliability of the learned graph. To this end, we propose a graph learning framework using Wasserstein distributionally robust optimization (WDRO) which handles uncertainty in data by defining an uncertainty set on distributions of the observed data. Specifically, two models are developed, one of which assumes all distributions in uncertainty set are Gaussian distributions and the other one has no prior distributional assumption. Instead of using interior point method directly, we propose two algorithms to solve the corresponding models and show that our algorithms are more time-saving. In addition, we also reformulate both two models into Semi-Definite Programming (SDP), and illustrate that they are intractable in the scenario of large-scale graph. Experiments on both synthetic and real world data are carried out to validate the proposed framework, which show that our scheme can learn a reliable graph in the context of uncertainty.

[71]  arXiv:2105.04260 (cross-list from cs.CR) [pdf, other]
Title: EPICTWIN: An Electric Power Digital Twin for Cyber Security Testing, Research and Education
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Cyber-Physical Systems (CPS) rely on advanced communication and control technologies to efficiently manage devices and the flow of information in the system. However, a wide variety of potential security challenges has emerged due to the evolution of critical infrastructures (CI) from siloed sub-systems into connected and integrated networks. This is also the case for CI such as a smart grid. Smart grid security studies are carried out on physical test-beds to provide its users a platform to train and test cyber attacks, in a safe and controlled environment. However, it has limitations w.r.t modifying physical configuration and difficulty to scale.
To overcome these shortcomings, we built a digital power twin for a physical test-bed that is used for cyber security studies on smart grids. On the developed twin, the users can deploy real world attacks and countermeasures, to test and study its effectiveness. The difference from the physical test-bed is that its users may easily modify their power system components and configurations. Further, reproducing the twin for using and advancing the research is significantly cheaper. The developed twin has advanced features compared to any equivalent system in the literature. To illustrate a typical use case, we present a case study where a cyber attack is launched and discuss its implications.

[72]  arXiv:2105.04294 (cross-list from cs.HC) [pdf, other]
Title: Toward asynchronous EEG-based BCI: Detecting imagined words segments in continuous EEG signals
Comments: 10 pages, 14 figures
Journal-ref: Biomedical Signal Processing and Control. Volume 65 (2021), 102351
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)

An asynchronous Brain--Computer Interface (BCI) based on imagined speech is a tool that allows to control an external device or to emit a message at the moment the user desires to by decoding EEG signals of imagined speech. In order to correctly implement these types of BCI, we must be able to detect from a continuous signal, when the subject starts to imagine words. In this work, five methods of feature extraction based on wavelet decomposition, empirical mode decomposition, frequency energies, fractal dimension and chaos theory features are presented to solve the task of detecting imagined words segments from continuous EEG signals as a preliminary study for a latter implementation of an asynchronous BCI based on imagined speech. These methods are tested in three datasets using four different classifiers and the higher F1 scores obtained are 0.73, 0.79, and 0.68 for each dataset, respectively. This results are promising to build a system that automatizes the segmentation of imagined words segments for latter classification.

[73]  arXiv:2105.04309 (cross-list from cs.SD) [pdf, other]
Title: Multi-modal Conditional Bounding Box Regression for Music Score Following
Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing approaches from the literature for sheet-image-based score following as well as an Optical Music Recognition baseline. The proposed approach achieves new state-of-the-art results and furthermore significantly improves the alignment performance on a set of real-world piano recordings by applying Impulse Responses as a data augmentation technique.

[74]  arXiv:2105.04349 (cross-list from cs.CV) [pdf, other]
Title: Generative Adversarial Registration for Improved Conditional Deformable Templates
Comments: 24 pages, 15 figures. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deformable templates are essential to large-scale medical image registration, segmentation, and population analysis. Current conventional and deep network-based methods for template construction use only regularized registration objectives and often yield templates with blurry and/or anatomically implausible appearance, confounding downstream biomedical interpretation. We reformulate deformable registration and conditional template estimation as an adversarial game wherein we encourage realism in the moved templates with a generative adversarial registration framework conditioned on flexible image covariates. The resulting templates exhibit significant gain in specificity to attributes such as age and disease, better fit underlying group-wise spatiotemporal trends, and achieve improved sharpness and centrality. These improvements enable more accurate population modeling with diverse covariates for standardized downstream analyses and easier anatomical delineation for structures of interest.

[75]  arXiv:2105.04361 (cross-list from physics.class-ph) [pdf, other]
Title: Extension of the single-nonlinear-mode theory by linear attachments and application to exciter-structure interaction
Authors: Malte Krack
Subjects: Classical Physics (physics.class-ph); Systems and Control (eess.SY)

Under certain conditions, the dynamics of a nonlinear mechanical system can be represented by a single nonlinear modal oscillator. The properties of the modal oscillator can be determined by computational or experimental nonlinear modal analysis. The simplification to a single-nonlinear-mode model facilitates qualitative and global analysis, and substantially reduces the computational effort required for probabilistic methods and design optimization. Important limitations of this theory are that only purely mechanical systems can be analyzed and that the respective nonlinear mode has to be recomputed when the system's structural properties are varied. With the theoretical extension proposed in this work, it becomes feasible to attach linear subsystems to the primary mechanical system, and to approximate the dynamics of this coupled system using only the nonlinear mode of the primary mechanical system. The attachments must be described by linear ordinary or differential-algebraic equations with time-invariant coefficient matrices. The attachments do not need to be of purely mechanical nature, but may contain, for instance, electric, magnetic, acoustic, thermal or aerodynamic models. This considerably extends the range of utility of nonlinear modes to applications as diverse as model updating or vibration energy harvesting. As long as the attachments do not significantly deteriorate the host system's modal deflection shape, it is shown that their effect can be reduced to a complex-valued modal impedance and an imposed modal forcing term. In the present work, the proposed approach is computationally assessed for the analysis of exciter-structure interaction. More specifically, the force drop typically encountered in frequency response testing is revisited.

[76]  arXiv:2105.04396 (cross-list from cs.RO) [pdf, other]
Title: Stability Constrained Mobile Manipulation Planning on Rough Terrain
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

This paper presents a framework that allows online dynamic-stability-constrained optimal trajectory planning of a mobile manipulator robot working on rough terrain. First, the kinematics model of a mobile manipulator robot, and the Zero Moment Point (ZMP) stability measure are presented as theoretical background. Then, a sampling-based quasi-static planning algorithm modified for stability guarantee and traction optimization in continuous dynamic motion is presented along with a mathematical proof. The robot's quasi-static path is then used as an initial guess to warm-start a nonlinear optimal control solver which may otherwise have difficulties finding a solution to the stability-constrained formulation efficiently. The performance and computational efficiency of the framework are demonstrated through an application to a simulated timber harvesting mobile manipulator machine working on varying terrain. The results demonstrate feasibility of online trajectory planning on varying terrain while satisfying the dynamic stability constraint.

[77]  arXiv:2105.04417 (cross-list from physics.med-ph) [pdf, other]
Title: A plug-and-play type field-deployable bio-agent free salicylic acid sensing system
Subjects: Medical Physics (physics.med-ph); Systems and Control (eess.SY)

Salicylic acid (SA) is a primary phytohormone released in response to stress, particularly biotic infections in plants. Monitoring SA levels may provide a way for early disease detection in crops providing a way for applying effective measures for reducing agricultural losses while increase our agricultural efficiency. Additionally, SA is an important chemical used extensively in the pharmaceutical and healthcare industry due to its analgesic and anti-inflammatory properties. Developing a fast and accurate way for monitoring SA levels in human serum can have a life-saving impact for patients suffering from overdosing and/or mis-dosing. In this work, we present a low-cost, portable, and field-deployable electrochemical SA sensing system aimed towards achieving the above-mentioned goals. The developed sensor consists of a plug-and-play type device equipped with specialized designed high accuracy sensing electronics and a novel procedure for robust data analysis. The developed sensor exhibits excellent linearity and sensitivity and selectivity. The practical applicability of the developed sensor was also demonstrated by measuring SA levels in real samples with good accuracy.

[78]  arXiv:2105.04458 (cross-list from cs.SD) [pdf, other]
Title: Learning Robust Latent Representations for Controllable Speech Synthesis
Comments: Accepted in ACL2021 Findings
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

State-of-the-art Variational Auto-Encoders (VAEs) for learning disentangled latent representations give impressive results in discovering features like pitch, pause duration, and accent in speech data, leading to highly controllable text-to-speech (TTS) synthesis. However, these LSTM-based VAEs fail to learn latent clusters of speaker attributes when trained on either limited or noisy datasets. Further, different latent variables start encoding the same features, limiting the control and expressiveness during speech synthesis. To resolve these issues, we propose RTI-VAE (Reordered Transformer with Information reduction VAE) where we minimize the mutual information between different latent variables and devise a modified Transformer architecture with layer reordering to learn controllable latent representations in speech data. We show that RTI-VAE reduces the cluster overlap of speaker attributes by at least 30\% over LSTM-VAE and by at least 7\% over vanilla Transformer-VAE.

[79]  arXiv:2105.04472 (cross-list from cs.RO) [pdf, other]
Title: Safety of the Intended Driving Behavior Using Rulebooks
Journal-ref: 2020 IEEE Intelligent Vehicles Symposium (IV)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Autonomous Vehicles (AVs) are complex systems that drive in uncertain environments and potentially navigate unforeseeable situations. Safety of these systems requires not only an absence of malfunctions but also high performance of functions in many different scenarios. The ISO/PAS 21448 [1] guidance recommends a process to ensure the Safety of the Intended Functionality (SOTIF) for road vehicles. This process starts with a functional specification that fully describes the intended functionality and further includes the verification and validation that the AV meets this specification. For the path planning function, defining the correct sequence of control actions for each vehicle in all potential driving situations is intractable. In this paper, the authors provide a link between the Rulebooks framework, presented by [2], and the SOTIF process. We establish that Rulebooks provide a functional description of the path planning task in an AV and discuss the potential usage of the method for verification and validation.

[80]  arXiv:2105.04488 (cross-list from cs.SD) [pdf, other]
Title: A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment
Comments: To be published in ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined speakers in a room and move itself towards that speaker, while avoiding collision with other speakers or going outside the room boundaries. The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker.

[81]  arXiv:2105.04489 (cross-list from cs.CV) [pdf, other]
Title: Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Comments: To appear at CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video. These descriptions can be captured in captions that provide expanded attributes for video labeling (e.g. actions/objects/scenes/sentiment/etc.) while allowing us to gain new insight into what people find important or necessary to summarize specific events. Existing caption datasets for video understanding are either small in scale or restricted to a specific domain. To address this, we present the Spoken Moments (S-MiT) dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events. We collect our descriptions using audio recordings to ensure that they remain as natural and concise as possible while allowing us to scale the size of a large classification dataset. In order to utilize our proposed dataset, we present a novel Adaptive Mean Margin (AMM) approach to contrastive learning and evaluate our models on video/caption retrieval on multiple datasets. We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.

[82]  arXiv:2105.04508 (cross-list from cs.CV) [pdf, other]
Title: MDA-Net: Multi-Dimensional Attention-Based Neural Network for 3D Image Segmentation
Authors: Rutu Gandhi, Yi Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Segmenting an entire 3D image often has high computational complexity and requires large memory consumption; by contrast, performing volumetric segmentation in a slice-by-slice manner is efficient but does not fully leverage the 3D data. To address this challenge, we propose a multi-dimensional attention network (MDA-Net) to efficiently integrate slice-wise, spatial, and channel-wise attention into a U-Net based network, which results in high segmentation accuracy with a low computational cost. We evaluate our model on the MICCAI iSeg and IBSR datasets, and the experimental results demonstrate consistent improvements over existing methods.

Replacements for Tue, 11 May 21

[83]  arXiv:1903.04656 (replaced) [pdf, other]
Title: Deep Log-Likelihood Ratio Quantization
Comments: Accepted for publication at EUSIPCO 2019. Camera-ready version
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[84]  arXiv:1904.12175 (replaced) [pdf, other]
Title: Unsupervised and Unregistered Hyperspectral Image Super-Resolution with Mutual Dirichlet-Net
Comments: IEEE Transactions on Remote Sensing and Geoscience
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[85]  arXiv:1906.07265 (replaced) [pdf, other]
Title: Recovering shared structure from multiple networks with unknown edge distributions
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Signal Processing (eess.SP); Methodology (stat.ME); Machine Learning (stat.ML)
[86]  arXiv:1906.07849 (replaced) [pdf, other]
Title: Deep Learning-Based Quantization of L-Values for Gray-Coded Modulation
Comments: Submitted to IEEE Globecom 2019
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[87]  arXiv:1907.11210 (replaced) [pdf, other]
Title: HUGE2: a Highly Untangled Generative-model Engine for Edge-computing
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[88]  arXiv:1909.07558 (replaced) [src]
Title: HAD-GAN: A Human-perception Auxiliary Defense GAN to Defend Adversarial Examples
Comments: There is some error in our work. For example,"Notably, we linked a fully connected discriminant network in parallel at the penultimate level of the target classifier." (section 3.2) Incorrect description of the network structure can mislead readers. For example, "associate GAN with human imagination" is not true.(section 1)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[89]  arXiv:1912.13494 (replaced) [pdf, ps, other]
Title: A frequency-domain analysis of inexact gradient methods
Authors: Oran Gannot
Comments: 42 pages; corrections and additional applications to accelerated methods. To appear in Mathematical Programming
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Numerical Analysis (math.NA)
[90]  arXiv:2001.01283 (replaced) [pdf, other]
Title: One-Shot Coordination of First and Last Mode Transportation
Comments: Please contact the authors for the supplementary material
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[91]  arXiv:2003.04529 (replaced) [pdf, ps, other]
Title: Controllability Issues of Linear Ensemble Systems over Multi-dimensional Parameterization Spaces
Authors: Xudong Chen
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[92]  arXiv:2003.09133 (replaced) [pdf, other]
Title: Efficient computation of backprojection arrays for 3D light field deconvolution
Authors: Martin Eberhart
Comments: 15 pages, 11 figures, 1 table. This is a thoroughly reworked version of the manuscript, with a clearer structure. It avoids any ambigiuties envoked by using the term 'transpose' in the context of multi-dimensional arrays
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[93]  arXiv:2003.14162 (replaced) [pdf, other]
Title: Deep State Space Models for Nonlinear System Identification
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Machine Learning (stat.ML)
[94]  arXiv:2005.07257 (replaced) [pdf, ps, other]
Title: Optimal Cybersecurity Investments in Large Networks Using SIS Model: Algorithm Design
Comments: 19 pages
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[95]  arXiv:2005.12573 (replaced) [pdf, other]
Title: Learning Global and Local Features of Normal Brain Anatomy for Unsupervised Abnormality Detection
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[96]  arXiv:2006.09534 (replaced) [pdf, other]
Title: Towards improving discriminative reconstruction via simultaneous dense and sparse coding
Comments: 20 pages
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[97]  arXiv:2006.14201 (replaced) [pdf, other]
Title: Convex Incremental Dissipativity Analysis of Nonlinear Systems
Comments: Extended version; Original version (without Appendix B) submitted to Automatica. Changes: Proof of Theorem 6
Subjects: Systems and Control (eess.SY)
[98]  arXiv:2007.01682 (replaced) [pdf, other]
Title: Improving auto-encoder novelty detection using channel attention and entropy minimization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[99]  arXiv:2008.10594 (replaced) [pdf, other]
Title: Joint Design of RF and gradient waveforms via auto-differentiation for 3D tailored excitation in MRI
Subjects: Image and Video Processing (eess.IV)
[100]  arXiv:2008.11013 (replaced) [pdf, ps, other]
Title: Circuit Synthesis based on Prescribed Lagrangian
Comments: arXiv admin note: substantial text overlap with arXiv:2007.02143
Subjects: Applied Physics (physics.app-ph); Signal Processing (eess.SP)
[101]  arXiv:2008.11589 (replaced) [pdf, other]
Title: Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[102]  arXiv:2008.12328 (replaced) [pdf, other]
Title: A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video
Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[103]  arXiv:2009.10550 (replaced) [pdf, other]
Title: URLLC with Massive MIMO: Analysis and Design at Finite Blocklength
Comments: 15 pages, 5 figures; to appear in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[104]  arXiv:2009.13326 (replaced) [pdf, ps, other]
Title: Database Assisted Nonlinear Least Squares Algorithm for Visible Light Positioning in NLOS Environments
Comments: 5 pages, 4 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[105]  arXiv:2010.00179 (replaced) [pdf, ps, other]
Title: System Design and Analysis for Energy-Efficient Passive UAV Radar Imaging System using Illuminators of Opportunity
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
[106]  arXiv:2010.03060 (replaced) [pdf, other]
Title: Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging
Comments: This work is under review with the IEEE Journal of Biomedical and Health Informatics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[107]  arXiv:2010.09231 (replaced) [pdf, other]
Title: CT-CPP: 3D Coverage Path Planning for Unknown Terrain Reconstruction using Coverage Trees
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[108]  arXiv:2010.09321 (replaced) [pdf, other]
Title: Poisson Image Deconvolution by a Plug-and-Play Quantum Denoising Scheme
Comments: 5 pages, 2 figures
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[109]  arXiv:2010.11483 (replaced) [pdf, other]
Title: Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework
Comments: 5 pages, Conference
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[110]  arXiv:2010.12146 (replaced) [pdf, other]
Title: Reliable Over-the-Air Computation by Amplify-and-Forward Based Relay
Journal-ref: in IEEE Access, vol. 9, pp. 53333-53342, 2021
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[111]  arXiv:2011.05308 (replaced) [pdf, other]
Title: EPSR: Edge Profile Super resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[112]  arXiv:2011.06531 (replaced) [pdf, other]
Title: Image analysis for Alzheimer's disease prediction: Embracing pathological hallmarks for model architecture design
Comments: 8 pages, 1 figure, Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[113]  arXiv:2011.11128 (replaced) [pdf, other]
Title: Deep Learning in EEG: Advance of the Last Ten-Year Critical Period
Comments: Accepted for publication in the IEEE Transactions on Cognitive and Developmental Systems
Journal-ref: IEEE Transactions on Cognitive and Developmental Systems, 2021
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[114]  arXiv:2011.14458 (replaced) [pdf, other]
Title: Hybrid Imitation Learning for Real-Time Service Restoration in Resilient Distribution Systems
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[115]  arXiv:2012.02877 (replaced) [pdf, other]
Title: Multi-Source Data Fusion Outage Location in Distribution Systems via Probabilistic Graph Models
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[116]  arXiv:2012.04494 (replaced) [pdf, other]
Title: Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition
Comments: Published in TASLP
Subjects: Audio and Speech Processing (eess.AS)
[117]  arXiv:2012.04515 (replaced) [pdf, other]
Title: Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[118]  arXiv:2012.05004 (replaced) [pdf, other]
Title: Modeling and Identification of Low Rank Vector Processes
Comments: A more detailed version of the submission with the same name to IFAC SYSID 2021
Subjects: Systems and Control (eess.SY)
[119]  arXiv:2012.07721 (replaced) [pdf, other]
Title: Non-linear State-space Model Identification from Video Data using Deep Encoders
Comments: Accepted to SYSID 2021 (revised with reviewer feedback)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[120]  arXiv:2012.09913 (replaced) [pdf, other]
Title: Quantifying the Unknown: Impact of Segmentation Uncertainty on Image-Based Simulations
Subjects: Computational Engineering, Finance, and Science (cs.CE); Image and Video Processing (eess.IV)
[121]  arXiv:2012.14580 (replaced) [pdf, other]
Title: Synchronization with prescribed transient behavior: Heterogeneous multi-agent systems under funnel coupling Extended arXiv version
Subjects: Systems and Control (eess.SY)
[122]  arXiv:2101.01496 (replaced) [pdf, other]
Title: An efficient feature-preserving PDE algorithm for image denoising based on a spatial-fractional anisotropic diffusion equation
Comments: 23 pages, 8 figures
Subjects: Numerical Analysis (math.NA); Image and Video Processing (eess.IV)
[123]  arXiv:2101.08390 (replaced) [pdf, other]
Title: An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning
Comments: Accepted to ISIT 2021
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
[124]  arXiv:2101.08625 (replaced) [pdf, other]
Title: Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
Subjects: Audio and Speech Processing (eess.AS)
[125]  arXiv:2101.10139 (replaced) [pdf, ps, other]
Title: Estimates for solutions of homogeneous time-delay systems: Comparison of Lyapunov-Krasovskii and Lyapunov-Razumikhin techniques
Comments: This paper has been submitted to International Journal of Control
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
[126]  arXiv:2103.07087 (replaced) [pdf, other]
Title: iToF2dToF: A Robust and Flexible Representation for Data-Driven Time-of-Flight Imaging
Comments: 32 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[127]  arXiv:2103.11016 (replaced) [pdf, ps, other]
Title: Multi-Robot Dynamical Source Seeking in Unknown Environments
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[128]  arXiv:2104.01309 (replaced) [pdf, other]
Title: Tighter bounds on transient moments of stochastic chemical systems
Comments: corrected typos and added implementation details
Subjects: Chemical Physics (physics.chem-ph); Systems and Control (eess.SY); Optimization and Control (math.OC)
[129]  arXiv:2104.02588 (replaced) [pdf]
Title: Principal Component Analysis Applied to Gradient Fields in Band Gap Optimization Problems for Metamaterials
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2104.08427 (replaced) [pdf, other]
Title: Models and Predictive Control for Nonplanar Vehicle Navigation
Comments: Added appendices, corrected typos
Subjects: Systems and Control (eess.SY)
[131]  arXiv:2104.08892 (replaced) [pdf]
Title: Internet of Fly Things For Post-Disaster Recovery Based on Multi-environment
Subjects: Signal Processing (eess.SP)
[132]  arXiv:2104.14365 (replaced) [pdf, other]
Title: On the Design and Analysis of Multivariable Extremum Seeking Control using Fast Fourier Transform
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[133]  arXiv:2105.00231 (replaced) [pdf]
Title: Normalization of regressor excitation as a part of dynamic regressor extension and mixing procedure
Comments: 13 pages, 3 figures
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[134]  arXiv:2105.02395 (replaced) [pdf, ps, other]
Title: Weighted Sum-Rate Maximization for Multi-Hop RIS-Aided Multi-User Communications: A Minorization-Maximization Approach
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[ total of 134 entries: 1-134 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2105, contact, help  (Access key information)