[1]  arXiv:2111.12138 [pdf, other]
Title: Multi-Modality Microscopy Image Style Transfer for Nuclei Segmentation
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

Annotating microscopy images for nuclei segmentation is laborious and time-consuming. To leverage the few existing annotations, also across multiple modalities, we propose a novel microscopy-style augmentation technique based on a generative adversarial network (GAN). Unlike other style transfer methods, it can not only deal with different cell assay types and lighting conditions, but also with different imaging modalities, such as bright-field and fluorescence microscopy. Using disentangled representations for content and style, we can preserve the structure of the original image while altering its style during augmentation. We evaluate our data augmentation on the 2018 Data Science Bowl dataset consisting of various cell assays, lighting conditions, and imaging modalities. With our style augmentation, the segmentation accuracy of the two top-ranked Mask R-CNN-based nuclei segmentation algorithms in the competition increases significantly. Thus, our augmentation technique renders the downstream task more robust to the test data heterogeneity and helps counteract class imbalance without resampling of minority classes.

[2]  arXiv:2111.12148 [pdf, other]
Title: Machine Learning Based Forward Solver: An Automatic Framework in gprMax
Comments: 6 pages, 6 figures
Subjects: Signal Processing (eess.SP); Geophysics (physics.geo-ph); Machine Learning (stat.ML)

General full-wave electromagnetic solvers, such as those utilizing the finite-difference time-domain (FDTD) method, are computationally demanding for simulating practical GPR problems. We explore the performance of a near-real-time, forward modeling approach for GPR that is based on a machine learning (ML) architecture. To ease the process, we have developed a framework that is capable of generating these ML-based forward solvers automatically. The framework uses an innovative training method that combines a predictive dimensionality reduction technique and a large data set of modeled GPR responses from our FDTD simulation software, gprMax. The forward solver is parameterized for a specific GPR application, but the framework can be extended in a straightforward manner to different electromagnetic problems.

[3]  arXiv:2111.12179 [pdf, other]
Title: Multifrequency 3D Elasticity Reconstruction withStructured Sparsity and ADMM
Subjects: Signal Processing (eess.SP)

We introduce a model-based iterative method to obtain shear modulus images of tissue using magnetic resonance elastography. The method jointly finds the displacement field that best fits multifrequency tissue displacement data and the corresponding shear modulus. The displacement satisfies a viscoelastic wave equation constraint, discretized using the finite element method. Sparsifying regularization terms in both shear modulus and the displacement are used in the cost function minimized for the best fit. The formulated problem is bi-convex. Its solution can be obtained iteratively by using the alternating direction method of multipliers. Sparsifying regularizations and the wave equation constraint filter out sensor noise and compressional waves. Our method does not require bandpass filtering as a preprocessing step and converges fast irrespective of the initialization. We evaluate our new method in multiple in silico and phantom experiments, with comparisons with existing methods, and we show improvements in contrast to noise and signal to noise ratios. Results from an in vivo liver imaging study show elastograms with mean elasticity comparable to other values reported in the literature.

[4]  arXiv:2111.12194 [pdf, other]
Title: Energy Efficient Video Decoding for VVC Using a Greedy Strategy Based Design Space Exploration
Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Subjects: Image and Video Processing (eess.IV)

IP traffic has increased significantly in recent years, and it is expected that this progress will continue. Recent studies report that the viewing of online video content accounts for a share of 1% of the global greenhouse gas emissions. To reduce the data traffic of video streaming, the new standard Versatile Video Coding (VVC) has been finalized in 2020. In this paper, the energy efficiency of two different VVC decoders is analyzed in detail. Furthermore, we propose a design space exploration that uses an algorithm based on a greedy strategy to derive coding tool profiles that optimize the energy demand of the decoder. We show that the algorithm derives optimal coding tool profiles for a subset of coding tools. Additionally, we propose profiles that reduce the energy demand of VVC decoders and provide energy savings of more than 50% for sequences with 4K resolution. Thereby, we will also show that the proposed profiles can have a lower decoding energy demand than comparable HEVC-encoded bit streams while also having a significantly lower bit rate.

[5]  arXiv:2111.12203 [pdf, other]
Title: KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing
Comments: MDX Workshop @ ISMIR 2021, 7 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex architecture shows outstanding performance, it usually requires numerous computing resources and time for training and evaluation. This paper proposes a two-stream neural network for music demixing, called KUIELab-MDX-Net, which shows a good balance of performance and required resources. The proposed model has a time-frequency branch and a time-domain branch, where each branch separates stems, respectively. It blends results from two streams to generate the final estimation. KUIELab-MDX-Net took second place on leaderboard A and third place on leaderboard B in the Music Demixing Challenge at ISMIR 2021. This paper also summarizes experimental results on another benchmark, MUSDB18. Our source code is available online.

[6]  arXiv:2111.12215 [pdf, other]
Title: Explainable multiple abnormality classification of chest CT volumes with AxialNet and HiResCAM
Comments: 25 pages, 7 figures, 6 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Understanding model predictions is critical in healthcare, to facilitate rapid verification of model correctness and to guard against use of models that exploit confounding variables. We introduce the challenging new task of explainable multiple abnormality classification in volumetric medical images, in which a model must indicate the regions used to predict each abnormality. To solve this task, we propose a multiple instance learning convolutional neural network, AxialNet, that allows identification of top slices for each abnormality. Next we incorporate HiResCAM, an attention mechanism, to identify sub-slice regions. We prove that for AxialNet, HiResCAM explanations are guaranteed to reflect the locations the model used, unlike Grad-CAM which sometimes highlights irrelevant locations. Armed with a model that produces faithful explanations, we then aim to improve the model's learning through a novel mask loss that leverages HiResCAM and 3D allowed regions to encourage the model to predict abnormalities based only on the organs in which those abnormalities appear. The 3D allowed regions are obtained automatically through a new approach, PARTITION, that combines location information extracted from radiology reports with organ segmentation maps obtained through morphological image processing. Overall, we propose the first model for explainable multi-abnormality prediction in volumetric medical images, and then use the mask loss to achieve a 33% improvement in organ localization of multiple abnormalities in the RAD-ChestCT data set of 36,316 scans, representing the state of the art. This work advances the clinical applicability of multiple abnormality modeling in chest CT volumes.

[7]  arXiv:2111.12227 [pdf, other]
Title: Artificial intelligence enabled radio propagation for communications-Part I: Channel characterization and antenna-channel optimization
Subjects: Signal Processing (eess.SP)

To provide higher data rates, as well as better coverage, cost efficiency, security, adaptability, and scalability, the 5G and beyond 5G networks are developed with various artificial intelligence techniques. In this two-part paper, we investigate the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. It firstly provides a comprehensive overview of ML for channel characterization and ML-based antenna-channel optimization in this first part, and then it gives a state-of-the-art literature review of channel scenario identification and channel modeling in Part II. Fundamental results and key concepts of ML for communication networks are presented, and widely used ML methods for channel data processing, propagation channel estimation, and characterization are analyzed and compared. A discussion of challenges and future research directions for ML-enabled next generation networks of the topics covered in this part rounds off the paper.

[8]  arXiv:2111.12228 [pdf, other]
Title: Artificial intelligence enabled radio propagation for communications-Part II: Scenario identification and channel modeling
Subjects: Signal Processing (eess.SP)

This two-part paper investigates the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. In Part I, we introduced AI and ML as well as provided a comprehensive survey on ML enabled channel characterization and antenna-channel optimization, and in this part (Part II) we review state-of-the-art literature on scenario identification and channel modeling here. In particular, the key ideas of ML for scenario identification and channel modeling/prediction are presented, and the widely used ML methods for propagation scenario identification and channel modeling and prediction are analyzed and compared. Based on the state-of-art, the future challenges of AI/ML-based channel data processing techniques are given as well.

[9]  arXiv:2111.12260 [pdf, ps, other]
Title: Federated Dynamic Neural Network for Deep MIMO Detection
Subjects: Signal Processing (eess.SP)

In this paper, we develop a dynamic detection network (DDNet) based detector for multiple-input multiple-output (MIMO) systems. By constructing an improved DetNet (IDetNet) detector and the OAMPNet detector as two independent network branches, the DDNet detector performs sample-wise dynamic routing to adaptively select a better one between the IDetNet and the OAMPNet detectors for every samples under different system conditions. To avoid the prohibitive transmission overhead of dataset collection in centralized learning (CL), we propose the federated averaging (FedAve)-DDNet detector, where all raw data are kept at local clients and only locally trained model parameters are transmitted to the central server for aggregation. To further reduce the transmission overhead, we develop the federated gradient sparsification (FedGS)-DDNet detector by randomly sampling gradients with elaborately calculated probability when uploading gradients to the central server. Based on simulation results, the proposed DDNet detector consistently outperforms other detectors under all system conditions thanks to the sample-wise dynamic routing. Moreover, the federated DDNet detectors, especially the FedGS-DDNet detector, can reduce the transmission overhead by at least 25.7\% while maintaining satisfactory detection accuracy.

[10]  arXiv:2111.12277 [pdf, other]
Title: One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. First, a speaker normalization module is adopted to remove speaker-related information in bottleneck features extracted by ASR. Second, we adopt weight regularization in the adaptation process to prevent over-fitting caused by using only one utterance from target speaker as training data. Finally, to comprehensively decouple the speech factors, i.e., content, speaker, style, and transfer source style to the target, a prosody module is used to extract prosody representation. Experiments show that our approach is superior to the state-of-the-art one-shot VC systems in terms of style and speaker similarity; additionally, our approach also maintains good speech quality.

[11]  arXiv:2111.12322 [pdf]
Title: Stochastic optimal scheduling of demand response-enabled microgrids with renewable generations: An analytical-heuristic approach
Comments: Accepted by Journal of Cleaner Production
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In the context of transition towards cleaner and sustainable energy production, microgrids have become an effective way for tackling environmental pollution and energy crisis issues. With the increasing penetration of renewables, how to coordinate demand response and renewable generations is a critical and challenging issue in the field of microgrid scheduling. To this end, a bi-level scheduling model is put forward for isolated microgrids with consideration of multi-stakeholders in this paper, where the lower- and upper-level models respectively aim to the minimization of user cost and microgrid operational cost under real-time electricity pricing environments. In order to solve this model, this research combines Jaya algorithm and interior point method (IPM) to develop a hybrid analysis-heuristic solution method called Jaya-IPM, where the lower- and upper- levels are respectively addressed by the IPM and the Jaya, and the scheduling scheme is obtained via iterations between the two levels. After that, the real-time prices updated by the upper-level model and the electricity plans determined by the lower-level model will be alternately iterated between the upper- and lower- levels through the real-time pricing mechanism to obtain an optimal scheduling plan. The test results show that the proposed method can coordinate the uncertainty of renewable generations with demand response strategies, thereby achieving a balance between the interests of microgrid and users; and that by leveraging demand response, the flexibility of the load side can be fully exploited to achieve peak load shaving while maintaining the balance of supply and demand. In addition, the Jaya-IPM algorithm is proven to be superior to the traditional hybrid intelligent algorithm (HIA) and the CPLEX solver in terms of optimization results and calculation efficiency.

[12]  arXiv:2111.12339 [pdf, other]
Title: Dual Domain Waveform Design for Joint Communication and Sensing Systems
Comments: 6 pages
Subjects: Signal Processing (eess.SP)

The evolution of wireless communication systems towards millimeter-wave ($30-100$ GHz) and sub-THz ($>100$ GHz) frequency bands highlighted the need for accurate and fast beam management and a proactive link-blockage prediction in high-mobility scenarios. Joint Communication and Sensing (JC\&S) systems aim at equipping communication terminals with sensing capabilities using the same time/frequency/space communication resources to solve, or alleviate, the aforementioned issues. For an efficient implementation, a suitable waveform design that combines communication and sensing capabilities is of utmost importance. This paper proposes a novel dual-domain waveform design approach that superimposes onto the Frequency-Time (FT) domain both the legacy orthogonal frequency division multiplexing modulation scheme and a sensing signal, purposely designed in the Delay-Doppler (DD) domain. The power of the two signals is properly allocated in FT and DD domains, respectively, to reduce their mutual interference and optimize both communication and sensing tasks. Numerical results show the effectiveness of the proposed JC\&S waveform design approach, yielding target communication and sensing performance with a full time-frequency resource sharing.

[13]  arXiv:2111.12483 [pdf, other]
Title: LDP-Net: An Unsupervised Pansharpening Network Based on Learnable Degradation Processes
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been proposed for the pansharpening task. However, these methods usually has two main drawbacks: 1) requiring HRMS for supervised learning; and 2) simply ignoring the latent relation between the MS and PAN image and fusing them directly. To solve these problems, we propose a novel unsupervised network based on learnable degradation processes, dubbed as LDP-Net. A reblurring block and a graying block are designed to learn the corresponding degradation processes, respectively. In addition, a novel hybrid loss function is proposed to constrain both spatial and spectral consistency between the pansharpened image and the PAN and LRMS images at different resolutions. Experiments on Worldview2 and Worldview3 images demonstrate that our proposed LDP-Net can fuse PAN and LRMS images effectively without the help of HRMS samples, achieving promising performance in terms of both qualitative visual effects and quantitative metrics.

[14]  arXiv:2111.12516 [pdf, other]
Title: LightSAFT: Lightweight Latent Source Aware Frequency Transform for Source Separation
Comments: MDX Workshop @ ISMIR 2021, 7 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separation models. This paper presents LightSAFT-Net, a lightweight version of LaSAFT-Net. As a baseline, it provided a sufficient SDR performance for comparison during the Music Demixing Challenge at ISMIR 2021. This paper also enhances the existing LightSAFT-Net by replacing the LightSAFT blocks in the encoder with TFC-TDF blocks. Our enhanced LightSAFT-Net outperforms the previous one with fewer parameters.

[15]  arXiv:2111.12539 [pdf, other]
Title: Information-Theoretic Approach for Model Reduction Over Finite Time Horizon
Subjects: Systems and Control (eess.SY)

This paper presents an information-theoretic approach for model reduction for finite time simulation. Although system models are typically used for simulation over a finite time, most of the metrics (and pseudo-metrics) used for model accuracy assessment consider asymptotic behavior e.g., Hankel singular values and Kullback-Leibler(KL) rate metric. These metrics could further be used for model order reduction. Hence, in this paper, we propose a generalization of KL divergence-based metric called n-step KL rate metric, which could be used to compare models over a finite time horizon. We then demonstrate that the asymptotic metrics for comparing dynamical systems may not accurately assess the model prediction uncertainties over a finite time horizon. Motivated by this finite time analysis, we propose a new pragmatic approach to compute the influence of a subset of states on a combination of states called information transfer (IT). Model reduction typically involves the removal or truncation of states. IT combines the concepts from the n-step KL rate metric and model reduction. Finally, we demonstrate the application of information transfer for model reduction. Although the analysis and definitions presented in this paper assume linear systems, they can be extended for nonlinear systems.

[16]  arXiv:2111.12638 [pdf, other]
Title: Optimal Robust Exact Differentiation via Linear Adaptive Techniques
Subjects: Systems and Control (eess.SY)

The problem of differentiating a function with bounded second derivative in the presence of bounded measurement noise is considered. Performance limitations in terms of the smallest achievable worst-case differentiation error of causal and exact differentiators are shown. A robust exact differentiator is then constructed via the adaptation of a single parameter of a linear differentiator. It is demonstrated that the resulting differentiator is robust with respect to noise, that it instantaneously converges to the exact derivative in the absence of noise, and that it attains the smallest possible -- hence optimal -- upper bound on its differentiation error under noisy measurements. For practical realization in the presence of sampled measurements, a discrete-time realization is shown that achieves optimal asymptotic accuracy with respect to the noise and the sampling period.

[17]  arXiv:2111.12678 [pdf, ps, other]
Title: Output Regulation by Postprocessing Internal Models for a Class of Multivariable Nonlinear Systems
Comments: The published version contains a few small rendering issues in some formulae. Here, these are corrected
Journal-ref: International Journal of Robust and Nonlinear Control, vol. 30, pp. 1115-1140, 2020
Subjects: Systems and Control (eess.SY)

In this paper we propose a new design paradigm, which employing a postprocessing internal model unit, to approach the problem of output regulation for a class of multivariable minimum-phase nonlinear systems possessing a partial normal form. Contrary to previous approaches, the proposed regulator handles control inputs of dimension larger than the number of regulated variables, provided that a controllability assumption holds, and can employ additional measurements that need not to vanish at the ideal error-zeroing steady state, but that can be useful for stabilization purposes or to fulfil the minimum-phase requirement. Conditions for practical and asymptotic output regulation are given, underlying how in postprocessing schemes the design of internal models is necessarily intertwined with that of the stabilizer.

[18]  arXiv:2111.12124 (cross-list from cs.SD) [pdf, ps, other]
Title: Towards Learning Universal Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learning systems on that benchmark. We discover that previous sound event classification or speech models do not generalize outside of their domains. We observe that more robust audio representations can be learned with the SimCLR objective; however, the model's transferability depends heavily on the model architecture. We find the Slowfast architecture is good at learning rich representations required by different domains, but its performance is affected by the normalization scheme. Based on these findings, we propose a novel normalizer-free Slowfast NFNet and achieve state-of-the-art performance across all domains.

[19]  arXiv:2111.12175 (cross-list from cs.LG) [pdf, other]
Title: Three-Way Deep Neural Network for Radio Frequency Map Generation and Source Localization
Comments: 5 pages, 5 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain to construct a smooth radio frequency map (RFMap) and then perform localization using a deep neural network. Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in facilitating dynamic spectrum access (DSA) in beyond-5G and 6G communication technologies. Localization, wireless signal detection, and spectrum policy-making are several of the applications where distributed spectrum sensing will play a significant role. Detection and positioning of wireless emitters is a very challenging task in a large spectral and spatial area. In order to construct a smooth RFMap database, a large number of measurements are required which can be very expensive and time consuming. One approach to help realize these systems is to collect finite localized measurements across a given area and then interpolate the measurement values to construct the database. Current methods in the literature employ channel modeling to construct the radio frequency map, which lacks the granularity for accurate localization whereas our proposed approach reconstructs a new generalized RFMap. Localization results are presented and compared with conventional channel models.

[20]  arXiv:2111.12181 (cross-list from cs.IT) [pdf, other]
Title: Channel Characterization of Diffusion-based Molecular Communication with Multiple Fully-absorbing Receivers
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper an analytical model is introduced to describe the impulse response of the diffusive channel between a pointwise transmitter and a given fully-absorbing (FA) receiver in a molecular communication (MC) system. The presence of neighbouring FA nanomachines in the environment is taken into account by describing them as sources of negative molecules. The channel impulse responses of all the receivers are linked in a system of integral equations. The solution of the system with two receivers is obtained analytically. For a higher number of receivers the system of integral equations is solved numerically. It is also shown that the channel impulse response shape is distorted by the presence of the interferers. For instance, there is a time shift of the peak in the number of absorbed molecules compared to the case without interference, as predicted by the proposed model. The analytical derivations are validated by means of particle based simulations.

[21]  arXiv:2111.12209 (cross-list from cs.NI) [pdf]
Title: Sistema de sensoriamento sem fio aplicavel a deteccao de incendios florestais
Comments: in Portuguese
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The developed system also has an integrated web application (dashboard) and that in real time, collects data from wireless sensors, which together form the sensor module, also called device. Then, this data is presented on a map associ- ated with the positioning of each sensor module. The developed system was tested using practical experiments that used flames, gases and lighting, simulating the occurrence of fires. With the tests performed, it was observed the feasibility of the system, hardware/software developed, in detecting the fires in the simulated scenarios. Therefore, it was found that the research is promising, and may advance in the future for the detection of real fires.

[22]  arXiv:2111.12212 (cross-list from cs.IT) [pdf, other]
Title: Long-Term CSI-based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning
Comments: Under revision in IEEE journal. Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we study the transmission design for reconfigurable intelligent surface (RIS)-aided multiuser communication networks. Different from most of the existing contributions, we consider long-term CSI-based transmission design, where both the beamforming vectors at the base station (BS) and the phase shifts at the RIS are designed based on long-term CSI, which can significantly reduce the channel estimation overhead. Due to the lack of explicit ergodic data rate expression, we propose a novel deep deterministic policy gradient (DDPG) based algorithm to solve the optimization problem, which was trained by using the channel vectors generated in an offline manner. Simulation results demonstrate that the achievable net throughput is higher than that achieved by the conventional instantaneous-CSI based scheme when taking the channel estimation overhead into account.

[23]  arXiv:2111.12290 (cross-list from cs.CV) [pdf, other]
Title: Attention-based Dual-stream Vision Transformer for Radar Gait Recognition
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Radar gait recognition is robust to light variations and less infringement on privacy. Previous studies often utilize either spectrograms or cadence velocity diagrams. While the former shows the time-frequency patterns, the latter encodes the repetitive frequency patterns. In this work, a dual-stream neural network with attention-based fusion is proposed to fully aggregate the discriminant information from these two representations. The both streams are designed based on the Vision Transformer, which well captures the gait characteristics embedded in these representations. The proposed method is validated on a large benchmark dataset for radar gait recognition, which shows that it significantly outperforms state-of-the-art solutions.

[24]  arXiv:2111.12295 (cross-list from cs.LG) [pdf, other]
Title: Animal Behavior Classification via Deep Learning on Embedded Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Machine Learning (stat.ML)

We develop an end-to-end deep-neural-network-based algorithm for classifying animal behavior using accelerometry data on the embedded system of an artificial intelligence of things (AIoT) device installed in a wearable collar tag. The proposed algorithm jointly performs feature extraction and classification utilizing a set of infinite-impulse-response (IIR) and finite-impulse-response (FIR) filters together with a multilayer perceptron. The utilized IIR and FIR filters can be viewed as specific types of recurrent and convolutional neural network layers, respectively. We evaluate the performance of the proposed algorithm via two real-world datasets collected from grazing cattle. The results show that the proposed algorithm offers good intra- and inter-dataset classification accuracy and outperforms its closest contenders including two state-of-the-art convolutional-neural-network-based time-series classification algorithms, which are significantly more complex. We implement the proposed algorithm on the embedded system of the collar tag's AIoT device to perform in-situ classification of animal behavior. We achieve real-time in-situ behavior inference from accelerometry data without imposing any strain on the available computational, memory, or energy resources of the embedded system.

[25]  arXiv:2111.12316 (cross-list from math.DS) [pdf, ps, other]
Title: A comment on stabilizing reinforcement learning
Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Systems and Control (eess.SY)

This is a short comment on the paper "Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation" by Vamvoudakis et al. The question of stability of reinforcement learning (RL) agents remains hard and the said work suggested an on-policy approach with a suitable stability property using a technique from adaptive control - a robustifying term to be added to the action. However, there is an issue with this approach to stabilizing RL, which we will explain in this note. Furthermore, Vamvoudakis et al. seems to have made a fallacious assumption on the Hamiltonian under a generic policy. To provide a positive result, we will not only indicate this mistake, but show critic neural network weight convergence under a stochastic, continuous-time environment, provided certain conditions on the behavior policy hold.

[26]  arXiv:2111.12324 (cross-list from cs.SD) [pdf, other]
Title: How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Interestingly, by removing one or several unimportant components, the cross-corpus results can be improved. This demonstrates the potential of the decomposition approach towards a generalizable emotion recognition.

[27]  arXiv:2111.12326 (cross-list from cs.SD) [pdf, other]
Title: A Study on Decoupled Probabilistic Linear Discriminant Analysis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Probabilistic linear discriminant analysis (PLDA) has broad application in open-set verification tasks, such as speaker verification. A key concern for PLDA is that the model is too simple (linear Gaussian) to deal with complicated data; however, the simplicity by itself is a major advantage of PLDA, as it leads to desirable generalization. An interesting research therefore is how to improve modeling capacity of PLDA while retaining the simplicity. This paper presents a decoupling approach, which involves a global model that is simple and generalizable, and a local model that is complex and expressive. While the global model holds a bird view on the entire data, the local model represents the details of individual classes. We conduct a preliminary study towards this direction and investigate a simple decoupling model including both the global and local models. The new model, which we call decoupled PLDA, is tested on a speaker verification task. Experimental results show that it consistently outperforms the vanilla PLDA when the model is based on raw speaker vectors. However, when the speaker vectors are processed by length normalization, the advantage of decoupled PLDA will be largely lost, suggesting future research on non-linear local models.

[28]  arXiv:2111.12331 (cross-list from cs.SD) [pdf, other]
Title: An MAP Estimation for Between-Class Variance
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Probabilistic linear discriminant analysis (PLDA) has been widely used in open-set verification tasks, such as speaker verification. A potential issue of this model is that the training set often contains limited number of classes, which makes the estimation for the between-class variance unreliable. This unreliable estimation often leads to degraded generalization. In this paper, we present an MAP estimation for the between-class variance, by employing an Inverse-Wishart prior. A key problem is that with hierarchical models such as PLDA, the prior is placed on the variance of class means while the likelihood is based on class members, which makes the posterior inference intractable. We derive a simple MAP estimation for such a model, and test it in both PLDA scoring and length normalization. In both cases, the MAP-based estimation delivers interesting performance improvement.

[29]  arXiv:2111.12382 (cross-list from cs.IT) [pdf, ps, other]
Title: Compressed Sensing Channel Estimation for OTFS Modulation in Non-Integer Delay-Doppler Domain
Comments: This is the author's self-archive preprint of a paper accepted in IEEE GLOBECOM 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper introduces a Compressed Sensing (CS) estimation scheme for Orthogonal Time Frequency Space (OTFS) channels with sparse multipath. The OTFS waveform represents signals in a two dimensional Delay-Doppler (DD) orthonormal basis. The proposed model does not require the assumption that the delays are integer multiples of the sampling period. The analysis shows that non-integer delay and Doppler shifts in the channel cannot be accurately modelled by integer approximations. An Orthogonal Matching Pursuit with Binary-division Refinement (OMPBR) estimation algorithm is proposed. The proposed estimator finds the best channel approximation over a continuous DD dictionary without integer approximations. This results in a significant reduction of the estimation normalized mean squared error with reasonable computational complexity.

[30]  arXiv:2111.12419 (cross-list from cs.CV) [pdf, other]
Title: NAM: Normalization-based Attention Module
Comments: 3 pages, 2 figures, 2 tables, 2 tables in the appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recognizing less salient features is the key for model compression. However, it has not been investigated in the revolutionary attention mechanisms. In this work, we propose a novel normalization-based attention module (NAM), which suppresses less salient weights. It applies a weight sparsity penalty to the attention modules, thus, making them more computational efficient while retaining similar performance. A comparison with three other attention mechanisms on both Resnet and Mobilenet indicates that our method results in higher accuracy. Code for this paper can be publicly accessed at https://github.com/Christian-lyc/NAM.

[31]  arXiv:2111.12429 (cross-list from cs.LG) [pdf, other]
Title: tsflex: flexible time series processing & feature extraction
Comments: The first two authors contributed equally. Submitted to SoftwareX
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their real-world applicability, as they cannot cope with irregularly-sampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domain-independent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularly-sampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for strided-window feature extraction, and (2) the sequence-index is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, in-depth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memory-efficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.

[32]  arXiv:2111.12444 (cross-list from cs.IT) [pdf, other]
Title: Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and Applications
Comments: This work is a JSAC invited survey & tutorial paper. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

The thriving of artificial intelligence (AI) applications is driving the further evolution of wireless networks. It has been envisioned that 6G will be transformative and will revolutionize the evolution of wireless from "connected things" to "connected intelligence". However, state-of-the-art deep learning and big data analytics based AI systems require tremendous computation and communication resources, causing significant latency, energy consumption, network congestion, and privacy leakage in both of the training and inference processes. By embedding model training and inference capabilities into the network edge, edge AI stands out as a disruptive technology for 6G to seamlessly integrate sensing, communication, computation, and intelligence, thereby improving the efficiency, effectiveness, privacy, and security of 6G networks. In this paper, we shall provide our vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models. New design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described. Standardization, software and hardware platforms, and application scenarios are also discussed to facilitate the industrialization and commercialization of edge AI systems.

[33]  arXiv:2111.12494 (cross-list from cs.IT) [pdf, other]
Title: Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC
Comments: Accepted for publication at CSCN 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The deployment of multi-access edge computing (MEC) is paving the way towards pervasive intelligence in future 6G networks. This new paradigm also proposes emerging requirements of dependable communications, which goes beyond the ultra-reliable low latency communication (URLLC), focusing on the performance of a closed loop instead of that of an unidirectional link. This work studies the simple but efficient one-shot transmission scheme, investigating the closed-loop-reliability-optimal policy of blocklength allocation under stringent time and energy constraints.

[34]  arXiv:2111.12497 (cross-list from cs.IT) [pdf, ps, other]
Title: Performance of Reconfigurable Intelligent Surfaces in the Presence of Generalized Gaussian Noise
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we investigate the performance of reconfigurable intelligent surface (RIS)-assisted communications, under the assumption of generalized Gaussian noise (GGN), over Rayleigh fading channels. Specifically, we consider an RIS, equipped with $N$ reflecting elements, and derive a novel closed-form expression for the symbol error rate (SER) of arbitrary modulation schemes. The usefulness of the derived new expression is that it can be used to capture the SER performance in the presence of special additive noise distributions such as Gamma, Laplacian, and Gaussian noise. These special cases are also considered and their associated asymptotic SER expressions are derived, and then employed to quantify the achievable diversity order of the system. The theoretical framework is corroborated by numerical results, which reveal that the shaping parameter of the GGN ($\alpha$) has a negligible effect on the diversity order of RIS-assisted systems, particularly for large $\alpha$ values. Accordingly, the maximum achievable diversity order is determined by $N$.

[35]  arXiv:2111.12521 (cross-list from math.OC) [pdf, other]
Title: Probabilistic Behavioral Distance and Tuning - Reducing and aggregating complex systems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)

Given a complex system with a given interface to the rest of the world, what does it mean for a the system to behave close to a simpler specification describing the behavior at the interface? We give several definitions for useful notions of distances between a complex system and a specification by combining a behavioral and probabilistic perspective. These distances can be used to tune a complex system to a specification. We show that our approach can successfully tune non-linear networked systems to behave like much smaller networks, allowing us to aggregate large sub-networks into one or two effective nodes. Finally, we discuss similarities and differences between our approach and $H_\infty$ model reduction.

[36]  arXiv:2111.12531 (cross-list from cs.SD) [pdf, ps, other]
Title: Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Comments: 4 pages + 1 refs; 1 figure; submitted to IEEE SPL (pending review)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Non-intrusive speech intelligibility (SI) prediction from binaural signals is useful in many applications. However, most existing signal-based measures are designed to be applied to single-channel signals. Measures specifically designed to take into account the binaural properties of the signal are often intrusive - characterised by requiring access to a clean speech signal - and typically rely on combining both channels into a single-channel signal before making predictions. This paper proposes a non-intrusive SI measure that computes features from a binaural input signal using a combination of vector quantization (VQ) and contrastive predictive coding (CPC) methods. VQ-CPC feature extraction does not rely on any model of the auditory system and is instead trained to maximise the mutual information between the input signal and output features. The computed VQ-CPC features are input to a predicting function parameterized by a neural network. Two predicting functions are considered in this paper. Both feature extractor and predicting functions are trained on simulated binaural signals with isotropic noise. They are tested on simulated signals with isotropic and real noise. For all signals, the ground truth scores are the (intrusive) deterministic binaural STOI. Results are presented in terms of correlations and MSE and demonstrate that VQ-CPC features are able to capture information relevant to modelling SI and outperform all the considered benchmarks - even when evaluating on data comprising of different noise field types.

[37]  arXiv:2111.12544 (cross-list from cs.CV) [pdf, other]
Title: LDDMM meets GANs: Generative Adversarial Networks for diffeomorphic registration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The purpose of this work is to contribute to the state of the art of deep-learning methods for diffeomorphic registration. We propose an adversarial learning LDDMM method for pairs of 3D mono-modal images based on Generative Adversarial Networks. The method is inspired by the recent literature for deformable image registration with adversarial learning. We combine the best performing generative, discriminative, and adversarial ingredients from the state of the art within the LDDMM paradigm. We have successfully implemented two models with the stationary and the EPDiff-constrained non-stationary parameterizations of diffeomorphisms. Our unsupervised and data-hungry approach has shown a competitive performance with respect to a benchmark supervised and rich-data approach. In addition, our method has shown similar results to model-based methods with a computational time under one second.

[38]  arXiv:2111.12557 (cross-list from cs.RO) [pdf, ps, other]
Title: Optimization-free Ground Contact Force Constraint Satisfaction in Quadrupedal Locomotion
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

We are seeking control design paradigms for legged systems that allow bypassing costly algorithms that depend on heavy on-board computers widely used in these systems and yet being able to match what they can do by using less expensive optimization-free frameworks. In this work, we present our preliminary results in modeling and control design of a quadrupedal robot called \textit{Husky Carbon}, which under development at Northeastern University (NU) in Boston. In our approach, we utilized a supervisory controller and an Explicit Reference Governor (ERG) to enforce ground reaction force constraints. These constraints are usually enforced using costly optimizations. However, in this work, the ERG manipulates the state references applied to the supervisory controller to enforce the ground contact constraints through an updated law based on Lyapunov stability arguments. As a result, the approach is much faster to compute than the widely used optimization-based methods.

[39]  arXiv:2111.12566 (cross-list from q-bio.QM) [pdf, other]
Title: Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy
Comments: Submitted to Speech Prosody 2022
Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the person, as well as intensity of PA. The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA. This paper presents the development of a Cantonese speech database that contains audio recordings of speech before and after physical exercises of different intensity levels. The corpus design and data collection process are described. Preliminary results of acoustical analysis are presented to illustrate the impact of PA on pitch level, pitch range, speaking and articulation rate, and time duration of pauses. It is also noted that the effect of PA is correlated to some of the PA and PL measures.

[40]  arXiv:2111.12577 (cross-list from cs.CV) [pdf, other]
Title: A Method for Evaluating the Capacity of Generative Adversarial Networks to Reproduce High-order Spatial Context
Comments: Submitted to IEEE-TPAMI. Early version with partial results has been accepted for poster presentation at SPIE-MI 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Generative adversarial networks are a kind of deep generative model with the potential to revolutionize biomedical imaging. This is because GANs have a learned capacity to draw whole-image variates from a lower-dimensional representation of an unknown, high-dimensional distribution that fully describes the input training images. The overarching problem with GANs in clinical applications is that there is not adequate or automatic means of assessing the diagnostic quality of images generated by GANs. In this work, we demonstrate several tests of the statistical accuracy of images output by two popular GAN architectures. We designed several stochastic object models (SOMs) of distinct features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect the known arrangement rules. We then tested the rates at which the different GANs correctly reproduced the rules under a variety of training scenarios and degrees of feature-class similarity. We found that ensembles of generated images can appear accurate visually, and correspond to low Frechet Inception Distance scores (FID), while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that while low-order ensemble statistics are largely correct, there are numerous quantifiable errors per image that plausibly can affect subsequent use of the GAN-generated images.

[41]  arXiv:2111.12588 (cross-list from cs.SD) [pdf, other]
Title: Towards Cross-Cultural Analysis using Music Information Dynamics
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

A music piece is both comprehended hierarchically, from sonic events to melodies, and sequentially, in the form of repetition and variation. Music from different cultures establish different aesthetics by having different style conventions on these two aspects. We propose a framework that could be used to quantitatively compare music from different cultures by looking at these two aspects.
The framework is based on an Music Information Dynamics model, a Variable Markov Oracle (VMO), and is extended with a variational representation learning of audio. A variational autoencoder (VAE) is trained to map audio fragments into a latent representation. The latent representation is fed into a VMO. The VMO then learns a clustering of the latent representation via a threshold that maximizes the information rate of the quantized latent representation sequence. This threshold effectively controls the sensibility of the predictive step to acoustic changes, which determines the framework's ability to track repetitions on longer time scales. This approach allows characterization of the overall information contents of a musical signal at each level of acoustic sensibility.
Our findings under this framework show that sensibility to subtle acoustic changes is higher for East-Asian musical traditions, while the Western works exhibit longer motivic structures at higher thresholds of differences in the latent space. This suggests that a profile of information contents, analyzed as a function of the level of acoustic detail can serve as a possible cultural characteristic.

[42]  arXiv:2111.12604 (cross-list from stat.ME) [pdf, other]
Title: State-space deep Gaussian processes with applications
Authors: Zheng Zhao
Comments: See reproducible codes in this https URL Permanent link this http URL
Journal-ref: Doctoral dissertation, Aalto University, 2021
Subjects: Methodology (stat.ME); Signal Processing (eess.SP); Machine Learning (stat.ML)

This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SS-DGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuous-discrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SS-DGPs and solving their regression problems. Lastly, this thesis features a number of applications of state-space (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectro-temporal features of signals.

[43]  arXiv:2111.12649 (cross-list from math.OC) [pdf, ps, other]
Title: Global Output Feedback Stabilization of Semilinear Reaction-Diffusion PDEs
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper addresses the topic of global output feedback stabilization of semilinear reaction-diffusion PDEs. The semilinearity is assumed to be confined into a sector condition. We consider two different types of actuation configurations, namely: bounded control operator and right Robin boundary control. The measurement is selected as a left Dirichlet trace. The control strategy is finite dimensional and is designed based on a linear version of the plant. We derive a set of sufficient conditions ensuring the global exponential stabilization of the semilinear reaction-diffusion PDE. These conditions are shown to be feasible provided the order of the controller is large enough and the size of the sector condition in which the semilinearity is confined into is small enough.

[44]  arXiv:2111.12675 (cross-list from cs.IT) [pdf, other]
Title: The Surprising Benefits of Hysteresis in Unlimited Sampling: Theory, Algorithms and Experiments
Comments: 24 pages
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The Unlimited Sensing Framework (USF) was recently introduced to overcome the sensor saturation bottleneck in conventional digital acquisition systems. At its core, the USF allows for high-dynamic-range (HDR) signal reconstruction by converting a continuous-time signal into folded, low-dynamic-range (LDR), modulo samples. HDR reconstruction is then carried out by algorithmic unfolding of the folded samples. In hardware, however, implementing an ideal modulo folding requires careful calibration, analog design and high precision. At the interface of theory and practice, this paper explores a computational sampling strategy that relaxes strict hardware requirements by compensating them via a novel, mathematically guaranteed recovery method. Our starting point is a generalized model for USF. The generalization relies on two new parameters modeling hysteresis and folding transients} in addition to the modulo threshold. Hysteresis accounts for the mismatch between the reset threshold and the amplitude displacement at the folding time and we refer to a continuous transition period in the implementation of a reset as folding transient. Both these effects are motivated by our hardware experiments and also occur in previous, domain-specific applications. We show that the effect of hysteresis is beneficial for the USF and we leverage it to derive the first recovery guarantees in the context of our generalized USF model. Additionally, we show how the proposed recovery can be directly generalized for the case of lower sampling rates. Our theoretical work is corroborated by hardware experiments that are based on a hysteresis enabled, modulo ADC testbed comprising off-the-shelf electronic components. Thus, by capitalizing on a collaboration between hardware and algorithms, our paper enables an end-to-end pipeline for HDR sampling allowing more flexible hardware implementations.

[45]  arXiv:1807.07686 (replaced) [pdf, other]
Title: Exact minimum number of bits to stabilize a linear system
Comments: Extended version of the paper accepted to IEEE Transactions on Automatic Control
Journal-ref: IEEE Transactions on Automatic Control, Oct. 2022
Subjects: Systems and Control (eess.SY)
[46]  arXiv:1910.02534 (replaced) [pdf, other]
Title: The CEO problem with inter-block memory
Journal-ref: IEEE Transactions on Information Theory, v. 67, No. 12, pp. 7752--7768, Dec. 2021
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
[47]  arXiv:2007.06226 (replaced) [pdf, other]
Title: AMITE: A Novel Polynomial Expansion for Analyzing Neural Network Nonlinearities
Comments: 13 pages, 2 tables, 9 figures, LaTeX; minor grammar updates, equation numbering, and exposition clarification updates
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[48]  arXiv:2009.10993 (replaced) [pdf]
Title: Deep Learning-Based Reconstruction of Interventional Tools from Four X-Ray Projections for Tomographic Interventional Guidance
Comments: Updated with final version of this article, as published in Medical Physics; 14 pages, 12 figures
Journal-ref: Eulig, E., et al. Deep learning-based reconstruction of interventional tools and devices from four X-ray projections for tomographic interventional guidance. Medical Physics. 2021; 48: 5837-5850
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)
[49]  arXiv:2011.04138 (replaced) [pdf, other]
Title: Posture Adjustment for a Wheel-legged Robotic System via Leg Force Control with Prescribed Transient Performance
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[50]  arXiv:2012.00968 (replaced) [pdf, other]
Title: Reconfigurable Intelligent Surfaces in Action for Non-Terrestrial Networks
Comments: 7 pages, 6 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[51]  arXiv:2012.04515 (replaced) [pdf, other]
Title: Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[52]  arXiv:2012.12468 (replaced) [pdf, other]
Title: CN-Celeb: multi-genre speaker recognition
Comments: submitted to Speech Communication
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53]  arXiv:2012.12471 (replaced) [pdf, other]
Title: A Principle Solution for Enroll-Test Mismatch in Speaker Recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54]  arXiv:2101.03329 (replaced) [pdf, ps, other]
Title: Coupling a generative model with a discriminative learning framework for speaker verification
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55]  arXiv:2102.04525 (replaced) [pdf, other]
Title: Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[56]  arXiv:2103.13581 (replaced) [pdf, other]
Title: EfficientTDNN: Efficient Architecture Search for Speaker Recognition
Comments: 13 pages, 12 figures, submitted to TASLP
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[57]  arXiv:2105.05458 (replaced) [pdf, other]
Title: Distributionally Robust Graph Learning from Smooth Signals under Moment Uncertainty
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)
[58]  arXiv:2105.09429 (replaced) [pdf, other]
Title: Point process simulation of generalised inverse Gaussian processes and estimation of the Jaeger integral
Subjects: Methodology (stat.ME); Signal Processing (eess.SP); Probability (math.PR)
[59]  arXiv:2106.00058 (replaced) [pdf, other]
Title: PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[60]  arXiv:2106.10933 (replaced) [pdf, ps, other]
Title: Semi-uniform Input-to-state Stability of Infinite-dimensional Systems
Authors: Masashi Wakaiki
Comments: 27 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[61]  arXiv:2107.08704 (replaced) [pdf, other]
Title: Leveraging Secondary Reflections and Mitigating Interference in Multi-IRS/RIS Aided Wireless Network
Comments: 14 pages (submitted to an IEEE Journal)
Subjects: Signal Processing (eess.SP)
[62]  arXiv:2107.09507 (replaced) [pdf]
Title: EEG-based Cross-Subject Driver Drowsiness Recognition with an Interpretable Convolutional Neural Network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
[63]  arXiv:2107.10294 (replaced) [pdf, other]
Title: User-Centric Perspective in Random Access Cell-Free Aided by Spatial Separability
Comments: 14 pages, 8 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[64]  arXiv:2108.01115 (replaced) [pdf, other]
Title: Triangular body-cover model of the vocal folds with coordinated activation of the five intrinsic laryngeal muscles
Comments: Primitive version, 54 pages, 8 figures, 4 tables. The present manuscript has been submitted to the Journal of the Acoustical Society of America (JASA)
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS); Biological Physics (physics.bio-ph)
[65]  arXiv:2108.13033 (replaced) [pdf, ps, other]
Title: Resource Allocation for Active IRS-Assisted Multiuser Communication Systems
Comments: 3 figures, submitted to Asilomar 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[66]  arXiv:2109.01303 (replaced) [pdf, other]
Title: Self-supervised Multi-class Pre-training for Unsupervised Anomaly Detection and Segmentation in Medical Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[67]  arXiv:2109.08717 (replaced) [pdf]
Title: The Optimization of the Constant Flow Parallel Micropump Using RBF Neural Network
Comments: Accepted to International Conference on Robotics and Automation Engineering (ICRAE), 2021
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
[68]  arXiv:2110.09748 (replaced) [pdf, other]
Title: User Based Design and Evaluation Pipeline for Indoor Airships
Comments: Submitting to ICRA 2022
Subjects: Systems and Control (eess.SY)
[69]  arXiv:2111.03380 (replaced) [pdf, other]
Title: Integral state-feedback control of linear time-varying systems: A performance preserving approach
Subjects: Systems and Control (eess.SY)
[70]  arXiv:2111.07914 (replaced) [pdf]
Title: Experimental Investigation on the Friction-induced Vibration with Periodic Characteristics in a Running-in Process under Lubrication
Subjects: Signal Processing (eess.SP)
[71]  arXiv:2111.10954 (replaced) [pdf, other]
Title: Generation Drawing/Grinding Trajectoy Based on Hierarchical CVAE
Comments: 7pages, 18figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[72]  arXiv:2111.11843 (replaced) [pdf, other]
Title: U-shape Transformer for Underwater Image Enhancement
Comments: 8 pages, 6 images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
