Electrical Engineering and Systems Science

New submissions for Fri, 30 Jul 21

[1]  arXiv:2107.13595 [pdf]
Title: Artificial Intelligence-Based Image Enhancement in PET Imaging: Noise Reduction and Resolution Enhancement
Subjects: Image and Video Processing (eess.IV)

High noise and low spatial resolution are two key confounding factors that limit the qualitative and quantitative accuracy of PET images. AI models for image denoising and deblurring are becoming increasingly popular for post-reconstruction enhancement of PET images. We present here a detailed review of recent efforts for AI-based PET image enhancement with a focus on network architectures, data types, loss functions, and evaluation metrics. We also highlight emerging areas in this field that are quickly gaining popularity, identify barriers to large-scale adoption of AI models for PET image enhancement, and discuss future directions.

[2]  arXiv:2107.13616 [pdf, other]
Title: Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
Subjects: Audio and Speech Processing (eess.AS); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

There are many important applications for detecting and localizing specific sound events within long, untrimmed documents including keyword spotting, medical observation, and bioacoustic monitoring for conservation. Deep learning techniques often set the state-of-the-art for these tasks. However, for some types of events, there is insufficient labeled data to train deep learning models. In this paper, we propose novel approaches to few-shot sound event detection utilizing region proposals and the Perceiver architecture, which is capable of accurately localizing sound events with very few examples of each class of interest. Motivated by a lack of suitable benchmark datasets for few-shot audio event detection, we generate and evaluate on two novel episodic rare sound event datasets: one using clips of celebrity speech as the sound event, and the other using environmental sounds. Our highest performing proposed few-shot approaches achieve 0.575 and 0.672 F1-score, respectively, with 5-shot 5-way tasks on these two datasets. These represent absolute improvements of 0.200 and 0.234 over strong proposal-free few-shot sound event detection baselines.

[3]  arXiv:2107.13634 [pdf, other]
Title: Neural Remixer: Learning to Remix Music with Interactive Control
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

The task of manipulating the level and/or effects of individual instruments to recompose a mixture of recording, or remixing, is common across a variety of applications such as music production, audio-visual post-production, podcasts, and more. This process, however, traditionally requires access to individual source recordings, restricting the creative process. To work around this, source separation algorithms can separate a mixture into its respective components. Then, a user can adjust their levels and mix them back together. This two-step approach, however, still suffers from audible artifacts and motivates further work. In this work, we seek to learn to remix music directly. To do this, we propose two neural remixing architectures that extend Conv-TasNet to either remix via a) source estimates directly or b) their latent representations. Both methods leverage a remixing data augmentation scheme as well as a mixture reconstruction loss to achieve an end-to-end separation and remixing process. We evaluate our methods using the Slakh and MUSDB datasets and report both source separation performance and the remixing quality. Our results suggest learning-to-remix significantly outperforms a strong separation baseline, is particularly useful for small changes, and can provide interactive user-controls.

[4]  arXiv:2107.13703 [pdf, other]
Title: A Similarity Measure of Histopathology Images by Deep Embeddings
Comments: 4 Pages, 2 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Histopathology digital scans are large-size images that contain valuable information at the pixel level. Content-based comparison of these images is a challenging task. This study proposes a content-based similarity measure for high-resolution gigapixel histopathology images. The proposed similarity measure is an expansion of cosine vector similarity to a matrix. Each image is divided into same-size patches with a meaningful amount of information (i.e., contained enough tissue). The similarity is measured by the extraction of patch-level deep embeddings of the last pooling layer of a pre-trained deep model at four different magnification levels, namely, 1x, 2.5x, 5x, and 10x magnifications. In addition, for faster measurement, embedding reduction is investigated. Finally, to assess the proposed method, an image search method is implemented. Results show that the similarity measure represents the slide labels with a maximum accuracy of 93.18\% for top-5 search at 5x magnification.

[5]  arXiv:2107.13740 [pdf, other]
Title: Three-dimensional instantaneous orbit map for rotor-bearing system based on a novel multivariable complex variational mode decomposition algorithm
Subjects: Signal Processing (eess.SP)

Full spectrum and holospectrum are homogenous information fusion technology developed for the fault diagnosis of rotating machinery and are often used in the analysis of the orbit of rotating machinery. However, both of the techniques are based on Fourier transform, so they can only handle stationary signals, which limits their development. By drawing inspiration from the approach of multivariate variational mode decomposition (MVMD) and the complex-valued signal decomposition, we propose a method called multivariate complex variational mode decomposition (MCVMD) for processing non-stationary complex-valued signals of multi-dimensional bearing surfaces in this work. In particular, the proposed method takes the advantages of the joint information between the complex-valued signals of multi-dimensional bearing surfaces, and owing to this property, we provide its three-dimensional instantaneous orbit map (3D-IOM) to present the overall perspective of the rotor-bearing system and also offer a high-resolution time-full spectrum (Time-FS) to display the forward and backward frequency components of all the bearing surfaces within a time-frequency plane. The effectiveness of the proposed method through both the simulated experiment and the real-life complex-valued signals are shown in this paper.

[6]  arXiv:2107.13761 [pdf, other]
Title: A Novel Passivity-Based Trajectory Tracking Control For Conservative Mechanical Systems
Authors: Robert Mahony
Journal-ref: Proceedings of the IEEE Conference on Decision and Control, pp. 4259-4266 (2019)
Subjects: Systems and Control (eess.SY)

Most passivity based trajectory tracking algorithms for mechanical systems can only stabilise reference trajectories that have constant energy. This paper overcomes this limitation by deriving a single variable Hamiltonian model for the reference trajectory and solving along the constrained trajectory to obtain a reference potential. This potential is then used as the model to shape the energy of the true system such that its free solutions include the desired reference trajectory. The proposed trajectory tracking algorithm interconnects the reference and true systems through a virtual spring damper along with an outer-loop energy pump/damper that stabilises the desired energy level of the interconnected system, ensuring asymptotic tracking of the desired trajectory. The resulting algorithm is a fully energy based trajectory tracking control for non-stationary trajectories of conservative mechanical systems.

[7]  arXiv:2107.13767 [pdf, ps, other]
Title: Edge computing in 5G cellular networks for real-time analysis of electrocardiography recorded with wearable textile sensors
Comments: Accepted at the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
Subjects: Signal Processing (eess.SP)

Fifth-generation (5G) cellular networks promise higher data rates, lower latency, and large numbers of interconnected devices. Thereby, 5G will provide important steps towards unlocking the full potential of the Internet of Things (IoT). In this work, we propose a lightweight IoT platform for continuous vital sign analysis. Electrocardiography (ECG) is acquired via textile sensors and continuously sent from a smartphone to an edge device using cellular networks. The edge device applies a state-of-the art deep learning model for providing a binary end-to-end classification if a myocardial infarction is at hand. Using this infrastructure, experiments with four volunteers were conducted. We compare 3rd, 4th-, and 5th-generation cellular networks (release 15) with respect to transmission latency, data corruption, and duration of machine learning inference. The best performance is achieved using 5G showing an average transmission latency of 110ms and data corruption in 0.07% of ECG samples. Deep learning inference took approximately 170ms. In conclusion, 5G cellular networks in combination with edge devices are a suitable infrastructure for continuous vital sign analysis using deep learning models. Future 5G releases will introduce multi-access edge computing (MEC) as a paradigm for bringing edge devices nearer to mobile clients. This will decrease transmission latency and eventually enable automatic emergency alerting in near real-time.

[8]  arXiv:2107.13776 [pdf, other]
Title: Downlink Analysis of LEO Multi-Beam Satellite Communication in Shadowed Rician Channels
Comments: Submitted to the IEEE Globecom 2021
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

The coming extension of cellular technology to base-stations in low-earth orbit (LEO) requires a fresh look at terrestrial 3GPP channel models. Relative to such models, sky-to-ground cellular channels will exhibit less diffraction, deeper shadowing, larger Doppler shifts, and possibly far stronger cross-cell interference: consequences of high elevation angles and extreme "sectorization" of LEO satellite transmissions into partially-overlapping spot beams. To permit forecasting of expected signal-to-noise ratio (SNR), interference-to-noise ratio (INR) and probability of outage, we characterize the powers of desired and interference signals as received by ground users from such a LEO satellite. In particular, building on the Shadowed Rician channel model, we observe that co-cell and cross-cell sky-to-ground signals travel along similar paths, whereas terrestrial co- and cross-cell signals travel along very different paths. We characterize SNR, signal-to-interference ratio (SIR), and INR using transmit beam profiles and linear relationships that we establish between certain Shadowed Rician random variables. These tools allow us to simplify certain density functions and moments, facilitating future analysis. Numerical results yield insight into the key question of whether emerging LEO systems should be viewed as interference- or noise-limited.

[9]  arXiv:2107.13819 [pdf, other]
Title: Sparse Joint Transmission for Cloud Radio Access Networks with Limited Fronthaul Capacity
Comments: arXiv admin note: text overlap with arXiv:1912.05231
Subjects: Signal Processing (eess.SP)

A cloud radio access network (C-RAN) is a promising cellular network, wherein densely deployed multi-antenna remote-radio-heads (RRHs) jointly serve many users using the same time-frequency resource. By extremely high signaling overheads for both channel state information (CSI) acquisition and data sharing at a baseband unit (BBU), finding a joint transmission strategy with a significantly reduced signaling overhead is indispensable to achieve the cooperation gain in practical C-RANs. In this paper, we present a novel sparse joint transmission (sparse-JT) method for C-RANs, where the number of transmit antennas per unit area is much larger than the active downlink user density. Considering the effects of noisy-and-incomplete CSI and the quantization errors in data sharing by a finite-rate fronthaul capacity, the key innovation of sparse-JT is to find a joint solution for cooperative RRH clusters, beamforming vectors, and power allocation to maximize a lower bound of the sum-spectral efficiency under the sparsity constraint of active RRHs. To find such a solution, we present a computationally efficient algorithm that guarantees to find a local-optimal solution for a relaxed sum-spectral efficiency maximization problem. By system-level simulations, we exhibit that sparse-JT provides significant gains in ergodic spectral efficiencies compared to existing joint transmissions.

[10]  arXiv:2107.13820 [pdf]
Title: The interpretation of endobronchial ultrasound image using 3D convolutional neural network for differentiating malignant and benign mediastinal lesions
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The purpose of this study is to differentiate malignant and benign mediastinal lesions by using the three-dimensional convolutional neural network through the endobronchial ultrasound (EBUS) image. Compared with previous study, our proposed model is robust to noise and able to fuse various imaging features and spatiotemporal features of EBUS videos. Endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) is a diagnostic tool for intrathoracic lymph nodes. Physician can observe the characteristics of the lesion using grayscale mode, doppler mode, and elastography during the procedure. To process the EBUS data in the form of a video and appropriately integrate the features of multiple imaging modes, we used a time-series three-dimensional convolutional neural network (3D CNN) to learn the spatiotemporal features and design a variety of architectures to fuse each imaging mode. Our model (Res3D_UDE) took grayscale mode, Doppler mode, and elastography as training data and achieved an accuracy of 82.00% and area under the curve (AUC) of 0.83 on the validation set. Compared with previous study, we directly used videos recorded during procedure as training and validation data, without additional manual selection, which might be easier for clinical application. In addition, model designed with 3D CNN can also effectively learn spatiotemporal features and improve accuracy. In the future, our model may be used to guide physicians to quickly and correctly find the target lesions for slice sampling during the inspection process, reduce the number of slices of benign lesions, and shorten the inspection time.

[11]  arXiv:2107.13822 [pdf, other]
Title: Semi-supervised Learning for Data-driven Soft-sensing of Biological and Chemical Processes
Comments: 32 pages, 11 figures
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Continuously operated (bio-)chemical processes increasingly suffer from external disturbances, such as feed fluctuations or changes in market conditions. Product quality often hinges on control of rarely measured concentrations, which are expensive to measure. Semi-supervised regression is a possible building block and method from machine learning to construct soft-sensors for such infrequently measured states. Using two case studies, i.e., the Williams-Otto process and a bioethanol production process, semi-supervised regression is compared against standard regression to evaluate its merits and its possible scope of application for process control in the (bio-)chemical industry.

[12]  arXiv:2107.13826 [pdf]
Title: Adaptive Sampling of Dynamic Systems for Generation of Fast and Accurate Surrogate Models
Comments: 13 pages, 8 figures
Subjects: Systems and Control (eess.SY)

For economic nonlinear model predictive control and dynamic real-time optimization fast and accurate models are necessary. Consequently, the use of dynamic surrogate models to mimic complex rigorous models is increasingly coming into focus. For dynamic systems, the focus so far had been on identifying a system's behavior surrounding a steady-state operation point. In this contribution, we propose a novel methodology to adaptively sample rigorous dynamic process models to generate a dataset for building dynamic surrogate models. The goal of the developed algorithm is to cover an as large as possible area of the feasible region of the original model. To demonstrate the performance of the presented framework it is applied on a dynamic model of a chlor-alkali electrolysis.

[13]  arXiv:2107.13833 [pdf, other]
Title: Recurrent U-net for automatic pelvic floor muscle segmentation on 3D ultrasound
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The prevalance of pelvic floor problems is high within the female population. Transperineal ultrasound (TPUS) is the main imaging modality used to investigate these problems. Automating the analysis of TPUS data will help in growing our understanding of pelvic floor related problems. In this study we present a U-net like neural network with some convolutional long short term memory (CLSTM) layers to automate the 3D segmentation of the levator ani muscle (LAM) in TPUS volumes. The CLSTM layers are added to preserve the inter-slice 3D information. We reach human level performance on this segmentation task. Therefore, we conclude that we successfully automated the segmentation of the LAM on 3D TPUS data. This paves the way towards automatic in-vivo analysis of the LAM mechanics in the context of large study populations.

[14]  arXiv:2107.13838 [pdf, other]
Title: Heterogeneously-Distributed Joint Radar Communications: Bayesian Resource Allocation
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Due to spectrum scarcity, the coexistence of radar and wireless communication has gained substantial research interest recently. Among many scenarios, the heterogeneouslydistributed joint radar-communication system is promising due to its flexibility and compatibility of existing architectures. In this paper, we focus on a heterogeneous radar and communication network (HRCN), which consists of various generic radars for multiple target tracking (MTT) and wireless communications for multiple users. We aim to improve the MTT performance and maintain good throughput levels for communication users by a well-designed resource allocation. The problem is formulated as a Bayesian Cram\'er-Rao bound (CRB) based minimization subjecting to resource budgets and throughput constraints. The formulated nonconvex problem is solved based on an alternating descent-ascent approach. Numerical results demonstrate the efficacy of the proposed allocation scheme for this heterogeneous network.

[15]  arXiv:2107.13851 [pdf, other]
Title: Tensor-Based Channel Estimation and Reflection Design for RIS-Aided Millimeter-Wave MIMO Communication Systems
Comments: Asilomar 2021
Subjects: Signal Processing (eess.SP)

In this work, we consider both channel estimation and reflection design problems in point-to-point reconfigurable intelligent surface (RIS)-aided millimeter-wave (mmWave) MIMO communication systems. First, we show that by exploiting the low-rank nature of mmWave MIMO channels, the received training signals can be written as a low-rank multi-way tensor admitting a canonical polyadic CP decomposition. Utilizing such a structure, a tensor-based RIS channel estimation method (termed TenRICE) is proposed, wherein the tensor factor matrices are estimated using an alternating least squares method. Using TenRICE, the transmitter-to-RIS and the RIS-to-receiver channels are efficiently and separately estimated, up to a trivial scaling factor. After that, we formulate the beamforming and RIS reflection design as a spectral efficiency maximization problem. Due to its non-convexity, we propose a heuristic non-iterative two-step method, where the RIS reflection vector is obtained in a closed form using a Frobenius-norm maximization (FroMax) strategy. Our numerical results show that TenRICE has a superior performance, compared to benchmark methods, approaching the Cram\'er-Rao lower bound with a low training overhead. Moreover, we show that FroMax achieves a comparable performance to benchmark methods with a lower complexity.

[16]  arXiv:2107.13890 [pdf, other]
Title: Frequency dynamics of the Northern European AC/DC power system: a look-ahead study
Subjects: Systems and Control (eess.SY)

A large share of renewable energy sources integrated into the national grids and an increased interconnection capacity with asynchronous networks, are the main contributors in reducing the kinetic energy storage in the Nordic Power System. Challenges arise to operate system after a loss of generation and to minimize the offshore systems interaction with onshore grids. To assess the associated challenges, a novel dynamic model was developed under the phasor approximation to represent the future Northern European Power System, including High Voltage Direct Current (HVDC) links and future offshore energy islands in the North Sea. First, the future frequency response is provided for two large disturbances to highlight the benefit of the developed model and point out potential future frequency issues. Consequently, further actions are investigated to better utilize the existing frequency containment reserves or to partially replace them using emergency droop-based power control of HVDC links. Lastly, the offshore grid interaction and frequency support to the Nordic network is investigated, and the dynamic response is compared for zero- and low-inertia designs of the offshore energy islands. The simulations were performed using industrial software, and the associated material is made publicly available.

[17]  arXiv:2107.13981 [pdf, other]
Title: Classical Risk-Averse Control for Finite-Horizon Borel Models
Comments: under review
Subjects: Systems and Control (eess.SY)

We study a risk-averse optimal control problem with a finite-horizon Borel model, where the cost is assessed via exponential utility. The setting permits non-linear dynamics, non-quadratic costs, and continuous spaces but is less general than the problem of optimizing an expected utility. Our contribution is to show the existence of an optimal risk-averse controller through the use of measure-theoretic first principles.

[18]  arXiv:2107.14038 [pdf, other]
Title: Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

We propose a novel deep learning framework for predicting permeability of porous media from their digital images. Unlike convolutional neural networks, instead of feeding the whole image volume as inputs to the network, we model the boundary between solid matrix and pore spaces as point clouds and feed them as inputs to a neural network based on the PointNet architecture. This approach overcomes the challenge of memory restriction of graphics processing units and its consequences on the choice of batch size, and convergence. Compared to convolutional neural networks, the proposed deep learning methodology provides freedom to select larger batch sizes, due to reducing significantly the size of network inputs. Specifically, we use the classification branch of PointNet and adjust it for a regression task. As a test case, two and three dimensional synthetic digital rock images are considered. We investigate the effect of different components of our neural network on its performance. We compare our deep learning strategy with a convolutional neural network from various perspectives, specifically for maximum possible batch size. We inspect the generalizability of our network by predicting the permeability of real-world rock samples as well as synthetic digital rocks that are statistically different from the samples used during training. The network predicts the permeability of digital rocks a few thousand times faster than a Lattice Boltzmann solver with a high level of prediction accuracy.

[19]  arXiv:2107.14133 [pdf]
Title: Sub-Nyquist Sampling with Optical Pulses for Photonic Blind Source Separation
Comments: Frontier in Optics
Subjects: Signal Processing (eess.SP)

We proposed and demonstrated an optical pulse sampling method for photonic blind source separation. It can separate large bandwidth of mixed signals by small sampling frequency, which can reduce the workload of digital signal processing.

[20]  arXiv:2107.14134 [pdf]
Title: Photonic Interference Cancellation with Hybrid Free Space Optical Communication and MIMO Receiver
Comments: Frontier in Optics 2021
Subjects: Signal Processing (eess.SP)

We proposed and demonstrated a hybrid blind source separation system which can switch between multiple-input and multi-output mode and free space optical communication mode depends on different situation to get best condition for separation.

[21]  arXiv:2107.14135 [pdf, other]
Title: Modifications of FastICA in Convolutive Blind Source Separation
Authors: YunPeng Li
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Convolutive blind source separation (BSS) is intended to recover the unknown components from their convolutive mixtures. Contrary to the contrast functions used in instantaneous cases, the spatial-temporal prewhitening stage and the para-unitary filters constraint are difficult to implement in a convolutive context. In this paper, we propose several modifications of FastICA to alleviate these difficulties. Our method performs the simple prewhitening step on convolutive mixtures prior to the separation and optimizes the contrast function under the diagonalization constraint implemented by single value decomposition (SVD). Numerical simulations are implemented to verify the performance of the proposed method.

[22]  arXiv:2107.14137 [pdf]
Title: Radio Frequency Interference Management with Free-Space Optical Communication and Photonic Signal Processing
Authors: Yang Qi, Ben Wu
Comments: Frontier in Optics 2021
Subjects: Signal Processing (eess.SP)

We design and experimentally demonstrate a radio frequency interference management system with free-space optical communication and photonic signal processing. The system provides real-time interference cancellation in 6 GHz wide bandwidth.

[23]  arXiv:2107.14138 [pdf, other]
Title: Fast Beam Training for RIS-Assisted Uplink Communication
Comments: This is codebook for single user case in RIS-assisted uplink communication. We have introduced fine correction in last stage of beam training
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

In this work, we propose a beam training codebook for Reconfigurable Intelligent Surface (RIS) assisted mmWave uplink communication. Beam training procedure is important to establish a reliable link between user node and Access point (AP). A codebook based training procedure reduces the search time to obtain best possible phase shift by RIS controller to align incident beam at RIS in the direction of receiving node. We consider a semi passive RIS to assist RIS controller with a feedback of minimum overhead. It is shown that the procedure detects a mobile node with high probability in a short interval of time. Further we use the same codebook at user node to know the desired direction of communication via RIS.

[24]  arXiv:2107.14159 [pdf, other]
Title: Security of Distributed Parameter Cyber-Physical Systems: Cyber-Attack Detection in Linear Parabolic PDEs
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Security of Distributed Parameter Cyber-Physical Systems (DPCPSs) is of critical importance in the face of cyber-attack threats. Although security aspects of Cyber-Physical Systems (CPSs) modelled by Ordinary differential Equations (ODEs) have been extensively explored during the past decade, security of DPCPSs has not received its due attention despite its safety-critical nature. In this work, we explore the security aspects of DPCPSs from a system theoretic viewpoint. Specifically, we focus on DPCPSs modelled by linear parabolic Partial Differential Equations (PDEs) subject to cyber-attacks in actuation channel. First, we explore the detectability of such attacks and derive conditions for stealthy attacks. Next, we develop a design framework for cyber-attack detection algorithms based on output injection observers. Such attack detection algorithms explicitly consider stability, robustness and attack sensitivity in their design. Finally, theoretical analysis and simulation studies are performed to illustrate the effectiveness of the proposed approach.

[25]  arXiv:2107.14175 [pdf, other]
Title: Swap-Free Fat-Water Separation in Dixon MRI using Conditional Generative Adversarial Networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Dixon MRI is widely used for body composition studies. Current processing methods associated with large whole-body volumes are time intensive and prone to artifacts during fat-water separation performed on the scanner, making the data difficult to analyse. The most common artifact are fat-water swaps, where the labels are inverted at the voxel level. It is common for researchers to discard swapped data (generally around 10%), which can be wasteful and lead to unintended biases. The UK Biobank is acquiring Dixon MRI for over 100,000 participants, and thousands of swaps will occur. If those go undetected, errors will propagate into processes such as abdominal organ segmentation and dilute the results in population-based analyses. There is a clear need for a fast and robust method to accurately separate fat and water channels. In this work we propose such a method based on style transfer using a conditional generative adversarial network. We also introduce a new Dixon loss function for the generator model. Using data from the UK Biobank Dixon MRI, our model is able to predict highly accurate fat and water channels that are free from artifacts. We show that the model separates fat and water channels using either single input (in-phase) or dual input (in-phase and opposed-phase), with the latter producing improved results. Our proposed method enables faster and more accurate downstream analysis of body composition from Dixon MRI in population studies by eliminating the need for visual inspection or discarding data due to fat-water swaps.

[26]  arXiv:2103.15990 (cross-list from cs.HC) [pdf, other]
Title: An Overview of Human Activity Recognition Using Wearable Sensors: Healthcare and Artificial Intelligence
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)

With the rapid development of the internet of things (IoT) and artificial intelligence (AI) technologies, human activity recognition (HAR) has been applied in a variety of domains such as security and surveillance, human-robot interaction, and entertainment. Even though a number of surveys and review papers have been published, there is a lack of HAR overview papers focusing on healthcare applications that use wearable sensors. Therefore, we fill in the gap by presenting this overview paper. In particular, we present our projects to illustrate the system design of HAR applications for healthcare. Our projects include early mobility identification of human activities for intensive care unit (ICU) patients and gait analysis of Duchenne muscular dystrophy (DMD) patients. We cover essential components of designing HAR systems including sensor factors (e.g., type, number, and placement location), AI model selection (e.g., classical machine learning models versus deep learning models), and feature engineering. In addition, we highlight the challenges of such healthcare-oriented HAR systems and propose several research opportunities for both the medical and the computer science community.

[27]  arXiv:2107.12964 (cross-list from cs.CV) [pdf, other]
Title: A Physiologically-Adapted Gold Standard for Arousal during Stress
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

Emotion is an inherently subjective psychophysiological human-state and to produce an agreed-upon representation (gold standard) for continuous emotion requires a time-consuming and costly training procedure of multiple human annotators. There is strong evidence in the literature that physiological signals are sufficient objective markers for states of emotion, particularly arousal. In this contribution, we utilise a dataset which includes continuous emotion and physiological signals - Heartbeats per Minute (BPM), Electrodermal Activity (EDA), and Respiration-rate - captured during a stress inducing scenario (Trier Social Stress Test). We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features. We utilise the state-of-the-art MuSe-Toolbox to consider both annotation delay and inter-rater agreement weighting when fusing the target signals. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC compared to .2118 CCC for arousal only. Multimodal fusion also improves overall CCC with audio plus video features obtaining up to .6157 CCC to recognize arousal plus EDA and BPM.

[28]  arXiv:2107.13591 (cross-list from physics.med-ph) [pdf, other]
Title: Detection of squawks in respiratory sounds of mechanically ventilated COVID-19 patients
Comments: 5 pages, 6 figures
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mechanically ventilated patients typically exhibit abnormal respiratory sounds. Squawks are short inspiratory adventitious sounds that may occur in patients with pneumonia, such as COVID-19 patients. In this work we devised a method for squawk detection in mechanically ventilated patients by developing algorithms for respiratory cycle estimation, squawk candidate identification, feature extraction, and clustering. The best classifier reached an F1 of 0.48 at the sound file level and an F1 of 0.66 at the recording session level. These preliminary results are promising, as they were obtained in noisy environments. This method will give health professionals a new feature to assess the potential deterioration of critically ill patients.

[29]  arXiv:2107.13617 (cross-list from cs.SD) [pdf, other]
Title: Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes
Comments: 4 figures, 4 tables and 7 pages. Accepted for publication at ISMIR Conference 2021
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)

This paper proposes a deep convolutional neural network for performing note-level instrument assignment. Given a polyphonic multi-instrumental music signal along with its ground truth or predicted notes, the objective is to assign an instrumental source for each note. This problem is addressed as a pitch-informed classification task where each note is analysed individually. We also propose to utilise several kernel shapes in the convolutional layers in order to facilitate learning of efficient timbre-discriminative feature maps. Experiments on the MusicNet dataset using 7 instrument classes show that our approach is able to achieve an average F-score of 0.904 when the original multi-pitch annotations are used as the pitch information for the system, and that it also excels if the note information is provided using third-party multi-pitch estimation algorithms. We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.

[30]  arXiv:2107.13632 (cross-list from cs.GT) [pdf, other]
Title: Efficient Episodic Learning of Nonstationary and Unknown Zero-Sum Games Using Expert Game Ensembles
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Game theory provides essential analysis in many applications of strategic interactions. However, the question of how to construct a game model and what is its fidelity is seldom addressed. In this work, we consider learning in a class of repeated zero-sum games with unknown, time-varying payoff matrix, and noisy feedbacks, by making use of an ensemble of benchmark game models. These models can be pre-trained and collected dynamically during sequential plays. They serve as prior side information and imperfectly underpin the unknown true game model. We propose OFULinMat, an episodic learning algorithm that integrates the adaptive estimation of game models and the learning of the strategies. The proposed algorithm is shown to achieve a sublinear bound on the saddle-point regret. We show that this algorithm is provably efficient through both theoretical analysis and numerical examples. We use a dynamic honeypot allocation game as a case study to illustrate and corroborate our results. We also discuss the relationship and highlight the difference between our framework and the classical adversarial multi-armed bandit framework.

[31]  arXiv:2107.13643 (cross-list from cs.CV) [pdf]
Title: Lighter Stacked Hourglass Human Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Human pose estimation (HPE) is one of the most challenging tasks in computer vision as humans are deformable by nature and thus their pose has so much variance. HPE aims to correctly identify the main joint locations of a single person or multiple people in a given image or video. Locating joints of a person in images or videos is an important task that can be applied in action recognition and object tracking. As have many computer vision tasks, HPE has advanced massively with the introduction of deep learning to the field. In this paper, we focus on one of the deep learning-based approaches of HPE proposed by Newell et al., which they named the stacked hourglass network. Their approach is widely used in many applications and is regarded as one of the best works in this area. The main focus of their approach is to capture as much information as it can at all possible scales so that a coherent understanding of the local features and full-body location is achieved. Their findings demonstrate that important cues such as orientation of a person, arrangement of limbs, and adjacent joints' relative location can be identified from multiple scales at different resolutions. To do so, they makes use of a single pipeline to process images in multiple resolutions, which comprises a skip layer to not lose spatial information at each resolution. The resolution of the images stretches as lower as 4x4 to make sure that a smaller spatial feature is included. In this study, we study the effect of architectural modifications on the computational speed and accuracy of the network.

[32]  arXiv:2107.13832 (cross-list from cs.SD) [pdf, other]
Title: Blind Room Parameter Estimation Using Multiple-Multichannel Speech Recordings
Comments: Accepted In WASPAA 2021 ( IEEE Workshop on Applications of Signal Processing to Audio and Acoustics )
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics. In this paper, we study the problem of jointly estimating the total surface area, the volume, as well as the frequency-dependent reverberation time and mean surface absorption of a room in a blind fashion, based on two-channel noisy speech recordings from multiple, unknown source-receiver positions. A novel convolutional neural network architecture leveraging both single- and inter-channel cues is proposed and trained on a large, realistic simulated dataset. Results on both simulated and real data show that using multiple observations in one room significantly reduces estimation errors and variances on all target quantities, and that using two channels helps the estimation of surface and volume. The proposed model outperforms a recently proposed blind volume estimation method on the considered datasets.

[33]  arXiv:2107.13856 (cross-list from cs.LG) [pdf, other]
Title: Predicting battery end of life from solar off-grid system field data using machine learning
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)

Hundreds of millions of people lack access to electricity. Decentralised solar-battery systems are key for addressing this whilst avoiding carbon emissions and air pollution, but are hindered by relatively high costs and rural locations that inhibit timely preventative maintenance. Accurate diagnosis of battery health and prediction of end of life from operational data improves user experience and reduces costs. But lack of controlled validation tests and variable data quality mean existing lab-based techniques fail to work. We apply a scaleable probabilistic machine learning approach to diagnose health in 1027 solar-connected lead-acid batteries, each running for 400-760 days, totalling 620 million data rows. We demonstrate 73% accurate prediction of end of life, eight weeks in advance, rising to 82% at the point of failure. This work highlights the opportunity to estimate health from existing measurements using `big data' techniques, without additional equipment, extending lifetime and improving performance in real-world applications.

[34]  arXiv:2107.13869 (cross-list from cs.NI) [pdf, other]
Title: Autonomous UAV Base Stations for Next Generation Wireless Networks: A Deep Learning Approach
Comments: 7 pages, 6 figures
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

To address the ever-growing connectivity demands of wireless communications, the adoption of ingenious solutions, such as Unmanned Aerial Vehicles (UAVs) as mobile Base Stations (BSs), is imperative. In general, the location of a UAV Base Station (UAV-BS) is determined by optimization algorithms, which have high computationally complexities and place heavy demands on UAV resources. In this paper, we show that a Convolutional Neural Network (CNN) model can be trained to infer the location of a UAV-BS in real time. In so doing, we create a framework to determine the UAV locations that considers the deployment of Mobile Users (MUs) to generate labels by using the data obtained from an optimization algorithm. Performance evaluations reveal that once the CNN model is trained with the given labels and locations of MUs, the proposed approach is capable of approximating the results given by the adopted optimization algorithm with high fidelity, outperforming Reinforcement Learning (RL)-based approaches. We also explore future research challenges and highlight key issues.

[35]  arXiv:2107.13875 (cross-list from cs.LG) [pdf, other]
Title: Spatio-temporal graph neural networks for multi-site PV power forecasting
Comments: 10 pages, 6 figures, submitted to IEEE Transactions on Sustainable Energy
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Accurate forecasting of solar power generation with fine temporal and spatial resolution is vital for the operation of the power grid. However, state-of-the-art approaches that combine machine learning with numerical weather predictions (NWP) have coarse resolution. In this paper, we take a graph signal processing perspective and model multi-site photovoltaic (PV) production time series as signals on a graph to capture their spatio-temporal dependencies and achieve higher spatial and temporal resolution forecasts. We present two novel graph neural network models for deterministic multi-site PV forecasting dubbed the graph-convolutional long short term memory (GCLSTM) and the graph-convolutional transformer (GCTrafo) models. These methods rely solely on production data and exploit the intuition that PV systems provide a dense network of virtual weather stations. The proposed methods were evaluated in two data sets for an entire year: 1) production data from 304 real PV systems, and 2) simulated production of 1000 PV systems, both distributed over Switzerland. The proposed models outperform state-of-the-art multi-site forecasting methods for prediction horizons of six hours ahead. Furthermore, the proposed models outperform state-of-the-art single-site methods with NWP as inputs on horizons up to four hours ahead.

[36]  arXiv:2107.13969 (cross-list from cs.CY) [pdf, other]
Title: Significance of Speaker Embeddings and Temporal Context for Depression Detection
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-of-the-art performance in depression detection. We also show that combining conventional OpenSMILE and COVAREP features, which carry complementary information, with speaker embeddings further improves the depression detection performance. The significance of temporal context in the training of deep learning models for depression detection is also analyzed in this paper.

[37]  arXiv:2107.14009 (cross-list from cs.SD) [pdf, other]
Title: PKSpell: Data-Driven Pitch Spelling and Key Signature Estimation
Comments: International Society for Music Information Retrieval Conference (ISMIR), Nov 2021, Online, India
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

We present PKSpell: a data-driven approach for the joint estimation of pitch spelling and key signatures from MIDI files. Both elements are fundamental for the production of a full-fledged musical score and facilitate many MIR tasks such as harmonic analysis, section identification, melodic similarity, and search in a digital music library. We design a deep recurrent neural network model that only requires information readily available in all kinds of MIDI files, including performances, or other symbolic encodings. We release a model trained on the ASAP dataset. Our system can be used with these pre-trained parameters and is easy to integrate into a MIR pipeline. We also propose a data augmentation procedure that helps retraining on small datasets. PKSpell achieves strong key signature estimation performance on a challenging dataset. Most importantly, this model establishes a new state-of-the-art performance on the MuseData pitch spelling dataset without retraining.

[38]  arXiv:2107.14028 (cross-list from cs.SD) [pdf, other]
Title: Estimating Respiratory Rate From Breath Audio Obtained Through Wearable Microphones
Comments: International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Respiratory rate (RR) is a clinical metric used to assess overall health and physical fitness. An individual's RR can change from their baseline due to chronic illness symptoms (e.g., asthma, congestive heart failure), acute illness (e.g., breathlessness due to infection), and over the course of the day due to physical exhaustion during heightened exertion. Remote estimation of RR can offer a cost-effective method to track disease progression and cardio-respiratory fitness over time. This work investigates a model-driven approach to estimate RR from short audio segments obtained after physical exertion in healthy adults. Data was collected from 21 individuals using microphone-enabled, near-field headphones before, during, and after strenuous exercise. RR was manually annotated by counting perceived inhalations and exhalations. A multi-task Long-Short Term Memory (LSTM) network with convolutional layers was implemented to process mel-filterbank energies, estimate RR in varying background noise conditions, and predict heavy breathing, indicated by an RR of more than 25 breaths per minute. The multi-task model performs both classification and regression tasks and leverages a mixture of loss functions. It was observed that RR can be estimated with a concordance correlation coefficient (CCC) of 0.76 and a mean squared error (MSE) of 0.2, demonstrating that audio can be a viable signal for approximating RR.

[39]  arXiv:2107.14051 (cross-list from cs.CV) [pdf]
Title: Improvement of image classification by multiple optical scattering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Multiple optical scattering occurs when light propagates in a non-uniform medium. During the multiple scattering, images were distorted and the spatial information they carried became scrambled. However, the image information is not lost but presents in the form of speckle patterns (SPs). In this study, we built up an optical random scattering system based on an LCD and an RGB laser source. We found that the image classification can be improved by the help of random scattering which is considered as a feedforward neural network to extracts features from image. Along with the ridge classification deployed on computer, we achieved excellent classification accuracy higher than 94%, for a variety of data sets covering medical, agricultural, environmental protection and other fields. In addition, the proposed optical scattering system has the advantages of high speed, low power consumption, and miniaturization, which is suitable for deploying in edge computing applications.

[40]  arXiv:2107.14061 (cross-list from cs.CV) [pdf]
Title: The Need and Status of Sea Turtle Conservation and Survey of Associated Computer Vision Advances
Comments: Currently under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

For over hundreds of millions of years, sea turtles and their ancestors have swum in the vast expanses of the ocean. They have undergone a number of evolutionary changes, leading to speciation and sub-speciation. However, in the past few decades, some of the most notable forces driving the genetic variance and population decline have been global warming and anthropogenic impact ranging from large-scale poaching, collecting turtle eggs for food, besides dumping trash including plastic waste into the ocean. This leads to severe detrimental effects in the sea turtle population, driving them to extinction. This research focusses on the forces causing the decline in sea turtle population, the necessity for the global conservation efforts along with its successes and failures, followed by an in-depth analysis of the modern advances in detection and recognition of sea turtles, involving Machine Learning and Computer Vision systems, aiding the conservation efforts.

[41]  arXiv:2107.14070 (cross-list from cs.CV) [pdf]
Title: Machine Learning Advances aiding Recognition and Classification of Indian Monuments and Landmarks
Comments: Currently under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Tourism in India plays a quintessential role in the country's economy with an estimated 9.2% GDP share for the year 2018. With a yearly growth rate of 6.2%, the industry holds a huge potential for being the primary driver of the economy as observed in the nations of the Middle East like the United Arab Emirates. The historical and cultural diversity exhibited throughout the geography of the nation is a unique spectacle for people around the world and therefore serves to attract tourists in tens of millions in number every year. Traditionally, tour guides or academic professionals who study these heritage monuments were responsible for providing information to the visitors regarding their architectural and historical significance. However, unfortunately this system has several caveats when considered on a large scale such as unavailability of sufficient trained people, lack of accurate information, failure to convey the richness of details in an attractive format etc. Recently, machine learning approaches revolving around the usage of monument pictures have been shown to be useful for rudimentary analysis of heritage sights. This paper serves as a survey of the research endeavors undertaken in this direction which would eventually provide insights for building an automated decision system that could be utilized to make the experience of tourism in India more modernized for visitors.

[42]  arXiv:2107.14123 (cross-list from cs.CV) [pdf, other]
Title: Mapping Vulnerable Populations with AI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Humanitarian actions require accurate information to efficiently delegate support operations. Such information can be maps of building footprints, building functions, and population densities. While the access to this information is comparably easy in industrialized countries thanks to reliable census data and national geo-data infrastructures, this is not the case for developing countries, where that data is often incomplete or outdated. Building maps derived from remote sensing images may partially remedy this challenge in such countries, but are not always accurate due to different landscape configurations and lack of validation data. Even when they exist, building footprint layers usually do not reveal more fine-grained building properties, such as the number of stories or the building's function (e.g., office, residential, school, etc.). In this project we aim to automate building footprint and function mapping using heterogeneous data sources. In a first step, we intend to delineate buildings from satellite data, using deep learning models for semantic image segmentation. Building functions shall be retrieved by parsing social media data like for instance tweets, as well as ground-based imagery, to automatically identify different buildings functions and retrieve further information such as the number of building stories. Building maps augmented with those additional attributes make it possible to derive more accurate population density maps, needed to support the targeted provision of humanitarian aid.

[43]  arXiv:2107.14132 (cross-list from cs.SD) [pdf, other]
Title: Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection
Comments: Submitted to ASVspoof 2021 Workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper, we provide a series of multi-tasking benchmarks for simultaneously detecting spoofing at the segmental and utterance levels in the PartialSpoof database. First, we propose the SELCNN network, which inserts squeeze-and-excitation (SE) blocks into a light convolutional neural network (LCNN) to enhance the capacity of hidden feature selection. Then, we implement multi-task learning (MTL) frameworks with SELCNN followed by bidirectional long short-term memory (Bi-LSTM) as the basic model. We discuss MTL in PartialSpoof in terms of architecture (uni-branch/multi-branch) and training strategies (from-scratch/warm-up) step-by-step. Experiments show that the multi-task model performs better than single-task models. Also, in MTL, binary-branch architecture more adequately utilizes information from two levels than a uni-branch model. For the binary-branch architecture, fine-tuning a warm-up model works better than training from scratch. Models can handle both segment-level and utterance-level predictions simultaneously overall under binary-branch multi-task architecture. Furthermore, the multi-task model trained by fine-tuning a segmental warm-up model performs relatively better at both levels except on the evaluation set for segmental detection. Segmental detection should be explored further.

[44]  arXiv:2107.14177 (cross-list from quant-ph) [pdf]
Title: Parallel and real-time post-processing for quantum random number generators
Comments: 9 pages, 8 figures
Subjects: Quantum Physics (quant-ph); Signal Processing (eess.SP)

Quantum random number generators (QRNG) based on continuous variable (CV) quantum fluctuations offer great potential for their advantages in measurement bandwidth, stability and integrability. More importantly, it provides an efficient and extensible path for significant promotion of QRNG generation rate. During this process, real-time randomness extraction using information theoretically secure randomness extractors is vital, because it plays critical role in the limit of throughput rate and implementation cost of QRNGs. In this work, we investigate parallel and real-time realization of several Toeplitz-hashing extractors within one field-programmable gate array (FPGA) for parallel QRNG. Elaborate layout of Toeplitz matrixes and efficient utilization of hardware computing resource in the FPGA are emphatically studied. Logic source occupation for different scale and quantity of Toeplitz matrices is analyzed and two-layer parallel pipeline algorithm is delicately designed to fully exploit the parallel algorithm advantage and hardware source of the FPGA. This work finally achieves a real-time post-processing rate of QRNG above 8 Gbps. Matching up with integrated circuit for parallel extraction of multiple quantum sideband modes of vacuum state, our demonstration shows an important step towards chip-based parallel QRNG, which could effectively improve the practicality of CV QRNGs, including device trusted, device-independent, and semi-device-independent schemes.

[45]  arXiv:2107.14218 (cross-list from cs.IT) [pdf, ps, other]
Title: Gossiping with Binary Freshness Metric
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We consider the binary freshness metric for gossip networks that consist of a single source and $n$ end-nodes, where the end-nodes are allowed to share their stored versions of the source information with the other nodes. We develop recursive equations that characterize binary freshness in arbitrarily connected gossip networks using the stochastic hybrid systems (SHS) approach. Next, we study binary freshness in several structured gossip networks, namely disconnected, ring and fully connected networks. We show that for both disconnected and ring network topologies, when the number of nodes gets large, the binary freshness of a node decreases down to 0 as $n^{-1}$, but the freshness is strictly larger for the ring topology. We also show that for the fully connected topology, the rate of decrease to 0 is slower, and it takes the form of $n^{-\rho}$ for a $\rho$ smaller than 1, when the update rates of the source and the end-nodes are sufficiently large. Finally, we study the binary freshness metric for clustered gossip networks, where multiple clusters of structured gossip networks are connected to the source node through designated access nodes, i.e., cluster heads. We characterize the binary freshness in such networks and numerically study how the optimal cluster sizes change with respect to the update rates in the system.

[46]  arXiv:1909.10874 (replaced) [pdf, other]
Title: Resilient Coordinated Movement of Connected Autonomous Vehicles
Comments: 10 pages, 7 figures, 1 algorithm
Subjects: Multiagent Systems (cs.MA); Cryptography and Security (cs.CR); Systems and Control (eess.SY)
[47]  arXiv:1912.05231 (replaced) [src]
Title: Sparse Joint Transmission for Cell-Free Massive MIMO: A Sparse PCA Approach
Comments: The new version entitled "Sparse Joint Transmission for Cloud Radio Access Networks with Limited Fronthaul Capacity" was uploaded
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[48]  arXiv:2001.04073 (replaced) [src]
Title: Noncooperative Precoding for Massive MIMO HetNets: SILNR Maximization Precoding
Comments: The new version entitled "Distributed Precoding Using Local CSIT for MU-MIMO Heterogeneous Cellular Networks" (arXiv:2011.00727) was uploaded
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[49]  arXiv:2003.08773 (replaced) [pdf, other]
Title: Do CNNs Encode Data Augmentations?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[50]  arXiv:2005.00864 (replaced) [pdf, other]
Title: Nonlinear analysis of charge-pump phase-locked loop: the hold-in and pull-in ranges
Subjects: Signal Processing (eess.SP)
[51]  arXiv:2006.09858 (replaced) [pdf, other]
Title: Geometry of Similarity Comparisons
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[52]  arXiv:2009.05261 (replaced) [pdf, other]
Title: End-to-end Learning for OFDM: From Neural Receivers to Pilotless Communication
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[53]  arXiv:2011.05645 (replaced) [pdf]
Title: From Aircraft Tracking Data to Network Delay Model: A Data-Driven Approach Considering En-Route Congestion
Comments: 57 pages, 40 figures
Subjects: Systems and Control (eess.SY)
[54]  arXiv:2011.10829 (replaced) [pdf, other]
Title: On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[55]  arXiv:2012.05680 (replaced) [pdf, other]
Title: Direct multimodal few-shot learning of speech and images
Comments: Accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56]  arXiv:2102.10253 (replaced) [pdf, ps, other]
Title: Safety Embedded Control of Nonlinear Systems via Barrier States
Comments: Updates: Corrected typos and added clarifying equations and discussions
Journal-ref: in IEEE Control Systems Letters, vol. 6, pp. 1328-1333, 2022
Subjects: Systems and Control (eess.SY)
[57]  arXiv:2103.03310 (replaced) [pdf, ps, other]
Title: On Angular Speed Estimation of Rigid Bodies
Comments: V1 is the extended version of the paper submitted to IEEE Control Systems Letters. V2 is the extended version of the paper accepted for publication. V3 fixes some minor oversights in the early access version of the paper published online. Corrections are included in the printed version. V4 fixes a minor oversight in the printed version of the paper
Journal-ref: IEEE Control Systems Letters, Volume: 6, pages 1394-1399, 2022
Subjects: Systems and Control (eess.SY)
[58]  arXiv:2103.11096 (replaced) [pdf, other]
Title: An Efficient Calibration Method for Triaxial Gyroscope
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[59]  arXiv:2103.15362 (replaced) [pdf, other]
Title: Self-triggered Stabilization of Discrete-time Linear Systems with Quantized State Measurements
Authors: Masashi Wakaiki
Comments: 8 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[60]  arXiv:2104.09920 (replaced) [pdf, other]
Title: GPS-denied Navigation: Attitude, Position, Linear Velocity, and Gravity Estimation with Nonlinear Stochastic Observer
Authors: Hashim A Hashim
Comments: 2021 IEEE American Control Conference (ACC)
Subjects: Systems and Control (eess.SY)
[61]  arXiv:2104.14985 (replaced) [pdf]
Title: A Path to Smart Radio Environments: An Industrial Viewpoint on Reconfigurable Intelligent Surfaces
Comments: Submitted to IEEE Wireless Communications on April 27, 2021. Revised on July 29, 2021
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Signal Processing (eess.SP)
[62]  arXiv:2105.02469 (replaced) [pdf, other]
Title: Point Cloud Audio Processing
Comments: Accepted at WASPAA 2021, Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[63]  arXiv:2105.07596 (replaced) [pdf, other]
Title: Sound Event Detection with Adaptive Frequency Selection
Comments: Accepted by IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64]  arXiv:2107.04954 (replaced) [pdf, other]
Title: ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data
Comments: Accepted in ACMMM 21. Camera ready version
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[65]  arXiv:2107.08661 (replaced) [pdf, other]
Title: Translatotron 2: Robust direct speech-to-speech translation
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2107.11001 (replaced) [pdf, other]
Title: Photon-Starved Scene Inference using Single Photon Cameras
Comments: ICCV 2021 submission
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[67]  arXiv:2107.11723 (replaced) [pdf, other]
Title: A 51.3 TOPS/W, 134.4 GOPS In-memory Binary Image Filtering in 65nm CMOS
Comments: 13 pages
Subjects: Image and Video Processing (eess.IV); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
