We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 173 entries: 1-173 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Tue, 25 Jan 22

[1]  arXiv:2201.08865 [pdf, other]
Title: On the in vivo recognition of kidney stones using machine learning
Comments: Paper submitted to Computer Methods and Programs in Biomedicine (CMPB)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Determining the type of kidney stones allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acquisition conditions) that an automated kidney stone classification is indeed feasible. This pilot study compares the kidney stone recognition performances of six shallow machine learning methods and three deep-learning architectures which were tested with in-vivo images of the four most frequent urinary calculi types acquired with an endoscope during standard ureteroscopies. This contribution details the database construction and the design of the tested kidney stones classifiers. Even if the best results were obtained by the Inception v3 architecture (weighted precision, recall and F1-score of 0.97, 0.98 and 0.97, respectively), it is also shown that choosing an appropriate colour space and texture features allows a shallow machine learning method to approach closely the performances of the most promising deep-learning methods (the XGBoost classifier led to weighted precision, recall and F1-score values of 0.96). This paper is the first one that explores the most discriminant features to be extracted from images acquired during ureteroscopies.

[2]  arXiv:2201.08879 [pdf]
Title: A Grid Fault Tolerant Doubly Fed Induction Generator Wind Turbine via Series Connected Grid Side Converter
Journal-ref: Proceedings of the American Wind Energy Association Wind Power 2006 Conference, Pittsburgh, PA, June 2006
Subjects: Systems and Control (eess.SY)

With steadily increasing wind turbine penetration, regulatory standards for grid interconnection are evolving to require that wind generation systems ride-through disturbances such as faults and support the grid during such events. Conventional modifications to the doubly fed induction generation (DFIG) architecture for providing ride-through result in limited control of the turbine shaft and grid current during fault events. A DFIG architecture in which the grid side converter is connected in series as opposed to parallel with the grid connection has shown improved low voltage ride through but poor power processing capabilities. In this paper a unified DFIG wind turbine architecture which employs a parallel grid side rectifier and series grid side converter is presented. The combination of these two converters enables unencumbered power processing and improved voltage disturbance ride through. A dynamic model and control structure for this unified architecture is developed. The operation of the system is illustrated using computer simulations.

[3]  arXiv:2201.08909 [pdf]
Title: Uncertainty-Cognizant Model Predictive Control for Energy Management of Residential Buildings with PVT and Thermal Energy Storage
Comments: Index terms: Stochastic Model predictive control (MPC), building energy management systems (BEMSs), renewable energy resources (RES), thermal energy storage system (TESS), Mixed-integer linear stochastic optimization
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

The building sector accounts for almost 40 percent of the global energy consumption. This reveals a great opportunity to exploit renewable energy resources in buildings to achieve the climate target. In this context, this paper offers a building energy system embracing a heat pump, a thermal energy storage system along with grid-connected photovoltaic thermal (PVT) collectors to supply both electric and thermal energy demands of the building with minimum operating cost. To this end, the paper develops a stochastic model predictive control (MPC) strategy to optimally determine the set-point of the whole building energy system while accounting for the uncertainties associated with the PVT energy generation. This system enables the building to 1-shift its electric demand from high-peak to off-peak hours and 2- sell electricity to the grid to make energy arbitrage.

[4]  arXiv:2201.08928 [pdf, other]
Title: Joint CFO and Channel Estimation for RIS-aided Multi-user Massive MIMO Systems
Comments: 11 pages, 12 figures, this manuscript has been submitted
Subjects: Signal Processing (eess.SP)

Accurate channel estimation is essential to achieve the performance gains promised by the use of reconfigurable intelligent surfaces (RISs) in wireless communications. In the uplink of multi-user orthogonal frequency division multiple access (OFDMA) systems, synchronization errors such as carrier frequency offsets (CFOs) can significantly degrade the channel estimation performance. This becomes more critical in RIS-aided communications, as even a small channel estimation error leads to a significant performance loss. Motivated by this, we propose a joint CFO and channel estimation method for RIS-aided multi-user massive multiple-input multiple-output (MIMO) systems. Our proposed pilot structure allows accurate estimation of the CFOs without multi-user interference (MUI), using the same pilot resources for both CFO estimation and channel estimation. For joint estimation of multiple users' CFOs, a correlation-based approach is devised using the received signals at all BS antennas. Using least-squares (LS) estimation with the obtained CFO values, the channels of all users are jointly estimated. For optimization of the RIS phase shifts at the data transmission stage, we propose a projected gradient method (PGM). Simulation results demonstrate that the proposed method provides an improvement in the normalized mean-square error (NMSE) of channel estimation as well as in the bit error rate (BER) performance.

[5]  arXiv:2201.08930 [pdf, other]
Title: A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only improve the ASR performance on the noisy test set which surpasses the original wav2vec2.0, but also ensure a tiny performance decrease on the clean test set. In addition, the effectiveness of the proposed method is demonstrated under different types of noise conditions.

[6]  arXiv:2201.08931 [pdf, other]
Title: Frequency and Phase Synchronization in Distributed Antenna Arrays Based on Consensus Averaging and Kalman Filtering
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

A decentralized approach for joint frequency and phase synchronization in distributed antenna arrays is presented. The nodes in the array share their frequencies and phases with their neighboring nodes to align these parameters across the array. Our signal model includes the frequency drifts and phase jitters of the local oscillators as well as the frequency and phase estimation errors at the nodes and models them using practical statistics. A decentralized frequency and phase consensus (DFPC) algorithm is proposed which uses an average consensus method in which each node in the array iteratively updates its frequency and phase by computing an average of the frequencies and phases of their neighboring nodes. Simulation results show that upon convergence the DFPC algorithm can align the frequencies and phases of all the nodes up to a residual phase error that is governed by the oscillators and the estimation errors. To reduce this residual phase error and thus improve the synchronization between the nodes, a Kalman filter based decentralized frequency and phase consensus (KF-DFPC) algorithm is presented. The total residual phase error at the convergence of the KF-DFPC and DFPC algorithms is derived theoretically. The synchronization performances of these algorithms are compared to each other in light of this theoretical residual phase error by varying the duration of the signals, connectivity of the nodes, the number of nodes in the array, and signal to noise ratio of the transmitted signals. Simulation results demonstrate that the proposed KF-DFPC algorithm converges in fewer iterations than the DFPC algorithm. Furthermore, for shorter intervals between local information broadcasts, the KF-DFPC algorithm significantly outperforms the DFPC algorithm in reducing the residual total phase error, irrespective of the signal to noise ratio of the received signals.

[7]  arXiv:2201.08934 [pdf, other]
Title: Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task. This initialization method can significantly improve the performance on the three tasks, which surpasses the official baseline results. Besides, we also utilize a public pre-trained model wav2vec2.0 and pre-train it using the official DiCOVA datasets. This wav2vec2.0 model is utilized to extract high-level features of the sound as the model input to replace conventional mel-frequency cepstral coefficients (MFCC) features. Experimental results reveal that using high-level features together with MFCC features can improve the performance. To further improve the performance, we also deploy some preprocessing techniques like silent segment removal, amplitude normalization and time-frequency mask. The proposed detection model is evaluated on the DiCOVA dataset and results show that our method achieves an area under curve (AUC) score of 88.44% on blind test in the fusion track.

[8]  arXiv:2201.08935 [pdf, other]
Title: SAR Image Change Detection Based on Multiscale Capsule Network
Journal-ref: in IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 3, pp. 484-488, March 2021
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Traditional synthetic aperture radar image change detection methods based on convolutional neural networks (CNNs) face the challenges of speckle noise and deformation sensitivity. To mitigate these issues, we proposed a Multiscale Capsule Network (Ms-CapsNet) to extract the discriminative information between the changed and unchanged pixels. On the one hand, the multiscale capsule module is employed to exploit the spatial relationship of features. Therefore, equivariant properties can be achieved by aggregating the features from different positions. On the other hand, an adaptive fusion convolution (AFC) module is designed for the proposed Ms-CapsNet. Higher semantic features can be captured for the primary capsules. Feature extracted by the AFC module significantly improves the robustness to speckle noise. The effectiveness of the proposed Ms-CapsNet is verified on three real SAR datasets. The comparison experiments with four state-of-the-art methods demonstrate the efficiency of the proposed method. Our codes are available at https://github.com/summitgao/SAR_CD_MS_CapsNet .

[9]  arXiv:2201.08944 [pdf, other]
Title: DCNGAN: A Deformable Convolutional-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video
Comments: 5 pages, 4 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.

[10]  arXiv:2201.08955 [pdf, other]
Title: Modality Bank: Learn multi-modality images across data centers without sharing medical data
Comments: arXiv admin note: substantial text overlap with arXiv:2012.08604
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method could learn a set of effective domain-specific modulation parameters plugged into a common domain-agnostic network. We demonstrate by switching different sets of configurations, the generator could output high-quality images for a specific modality. Our method could also complete the missing modalities across all data centers, thus could be used for modality completion purposes. The downstream task trained from the synthesized multi-modality samples could achieve higher performance than learning from one real data center and achieve close-to-real performance compare with all real images.

[11]  arXiv:2201.08993 [pdf, other]
Title: Topological Signal Representation and Processing over Cell Complexes
Comments: Submitted to IEEE Transactions on Signal Processing, January 2022
Subjects: Signal Processing (eess.SP)

Topological Signal Processing (TSP) over simplicial complexes is a framework that has been recently proposed, as a generalization of graph signal processing (GSP), aimed to analyze signals defined over sets of any order (i.e. not only vertices of a graph) and to capture relations of any order present in the observed data. Our goal in this paper is to extend the TSP framework to deal with signals defined over cell complexes, i.e. topological spaces that are not constrained to satisfy the inclusion property of simplicial complexes, namely the condition that, if a set belongs to the complex, then all its subsets belong to the complex as well. We start showing how to translate the algebraic topological tools to deal with signals defined over cell complexes and then we propose a method to infer the structure of the cell complex from data. Then, we address the filtering problem over cell complexes and we provide the theoretical conditions under which the independent filtering of the solenoidal and irrotational components of edge signals brings a performance improvement with respect to a common filtering strategy. Furthermore, we propose a distributed strategy to filter the harmonic signals with the aim of retrieving the sparsest representation of the harmonic components. Finally, we quantify the advantages of using cell complexes instead of simplicial complexes, in terms of the sparsity/accuracy trade-off and of the signal recovery accuracy from sparse samples, using both simulated and real flows measured on data traffic and transportation networks.

[12]  arXiv:2201.08998 [pdf, other]
Title: Nonlinear $H_{\infty}$ Filtering on the Special Orthogonal Group $SO(3)$ using Vector Directions
Subjects: Systems and Control (eess.SY)

The problem of $H_{\infty}$ filtering for attitude estimation using rotation matrices and vector measurements is studied. Starting from a storage function on the Special Orthogonal Group $SO(3)$, a dissipation inequality is considered, and a deterministic nonlinear $H_{\infty}$ filter is derived which respects a given upper bound $\gamma$ on the energy gain from exogenous disturbances and initial estimation errors to a generalized estimation error. The results are valid for all estimation errors which correspond to an angular error of less than $\pi/2$ radians in terms of the axis-angle representation. The approach builds on earlier results on attitude estimation, in particular nonlinear $H_{\infty}$ filtering using quaternions, and proposes a novel filter developed directly on $SO(3)$. The proposed filter employs the same innovation term as the Multiplicative Extended Kalman Filter (MEKF), as well as a matrix gain updated in accordance with a Riccati-type gain update equation. However, in contrast to the MEKF, the proposed filter has an additional tuning gain, $\gamma$, which enables it to be more aggressive during transients. The filter is simulated for different conditions, and the results are compared with those obtained using the continuous-time quaternion MEKF and the Geometric Approximate Minimum Energy (GAME) filter. Simulations indicate competitive performance. In particular, the GAME filter has the best transient performance, followed by the proposed $H_{\infty}$ filter and the quaternion MEKF. All three filters have similar steady-state performance. Therefore, the proposed filter can be seen as a MEKF variant which achieves better transient performance without significant degradation in steady-state noise rejection.

[13]  arXiv:2201.09030 [pdf, other]
Title: Slotted ALOHA and CSMA Protocols for FMCW Radar Networks
Subjects: Systems and Control (eess.SY)

We study medium access in FMCW radar networks. We assume that all the radars use same parameters, e.g., chirp duration, chirp slope, cutoff frequency, number of chirps per packet, etc, and propose and analyze slotted ALOHA and CSMA protocols to mitigate narrowband interference. We define a notion of throughput to quantify the performance of the proposed protocols. In the case of ALOHA, we analyze interference probability and throughput as functions of the system parameters. We observe that interference probability and throughput may behave differently than in wireless communication networks. For instance, if the number of chirps per packet is larger than one, the interference probabilities may be smaller for higher transmission rates. We define a medium sensing procedure, referred to as clear channel assessment (CCA), as a part of the proposed CSMA, and also define CCA success and failure events. In CSMA, the radars transmit only after a successful CCA. We study, CCA success probability, interference probability and throughput as functions of the system parameters. We observe that, unlike wireless communication networks, using the highest possible attempt rates may maximize throughput in few network scenarios. We perform an extensive simulation to verify our analytical results and to compare slotted ALOHA and CSMA. We observe that CSMA outperforms ALOHA in all realistic scenarios.

[14]  arXiv:2201.09091 [pdf, ps, other]
Title: Target Sensing with Intelligent Reflecting Surface: Architecture and Performance
Comments: Accepted by IEEE Journal on Selected Areas in Communications Special Issue on Integrated Sensing and Communication
Subjects: Signal Processing (eess.SP)

Intelligent reflecting surface (IRS) has emerged as a promising technology to reconfigure the radio propagation environment by dynamically controlling wireless signal's amplitude and/or phase via a large number of reflecting elements. In contrast to the vast literature on studying IRS's performance gains in wireless communications, we study in this paper a new application of IRS for sensing/localizing targets in wireless networks. Specifically, we propose a new self-sensing IRS architecture where the IRS controller is capable of transmitting probing signals that are not only directly reflected by the target (referred to as the direct echo link), but also consecutively reflected by the IRS and then the target (referred to as the IRS-reflected echo link). Moreover, dedicated sensors are installed at the IRS for receiving both the direct and IRS-reflected echo signals from the target, such that the IRS can sense the direction of its nearby target by applying a customized multiple signal classification (MUSIC) algorithm. However, since the angle estimation mean square error (MSE) by the MUSIC algorithm is intractable, we propose to optimize the IRS passive reflection for maximizing the average echo signals' total power at the IRS sensors and derive the resultant Cramer-Rao bound (CRB) of the angle estimation MSE. Last, numerical results are presented to show the effectiveness of the proposed new IRS sensing architecture and algorithm, as compared to other benchmark sensing systems/algorithms.

[15]  arXiv:2201.09095 [pdf, other]
Title: Excitation allocation for generic identifiability of linear dynamic networks with fixed modules
Subjects: Systems and Control (eess.SY)

Identifiability of linear dynamic networks requires the presence of a sufficient number of external excitation signals. The problem of allocating a minimal number of external signals for guaranteeing generic network identifiability has been recently addressed in the literature. Here we will extend that work by explicitly incorporating the situation that some network modules are known, and thus are fixed in the parametrized model set. The graphical approach introduced earlier is extended to this situation, showing that the presence of fixed modules reduces the required number of external signals. An algorithm is presented that allocates the external signals in a systematic fashion.

[16]  arXiv:2201.09124 [pdf, ps, other]
Title: Copula-Based Modeling of RIS-Assisted Communications: Outage Probability Analysis
Subjects: Signal Processing (eess.SP)

Statistical characterization of the signal-to-noise ratio (SNR) of reconfigurable intelligent surface (RIS)-assistedcommunications in the presence of phase noise is an important open issue. In this letter, we exploit the concept of copula modeling to capture the non-standard dependence features that appear due to the presence of discrete phase noise. In particular,we consider the outage probability of RIS systems in Rayleighfading channels and provide joint distributions to characterize the dependencies due to the use of finite resolution phase shifters at the RIS. Numerical assessments confirm the validity of closed-form expressions of the outage probability and motivate the use of bivariate copula for further RIS studies.

[17]  arXiv:2201.09132 [pdf, other]
Title: CNN-based regularisation for CT image reconstructions
Authors: Attila Juhos
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

X-ray computed tomographic infrastructures are medical imaging modalities that rely on the acquisition of rays crossing examined objects while measuring their intensity decrease. Physical measurements are post-processed by mathematical reconstruction algorithms that may offer weaker or top-notch consistency guarantees on the computed volumetric field. Superior results are provided on the account of an abundance of low-noise measurements being supplied. Nonetheless, such a scanning process would expose the examined body to an undesirably large-intensity and long-lasting ionising radiation, imposing severe health risks. One main objective of the ongoing research is the reduction of the number of projections while keeping the quality performance stable. Due to the under-sampling, the noise occurring inherently because of photon-electron interactions is now supplemented by reconstruction artifacts. Recently, deep learning methods, especially fully convolutional networks have been extensively investigated and proven to be efficient in filtering such deviations. In this report algorithms are presented that take as input a slice of a low-quality reconstruction of the volume in question and aim to map it to the reconstruction that is considered ideal, the ground truth. Above that, the first system comprises two additional elements: firstly, it ensures the consistency with the measured sinogram, secondly it adheres to constraints proposed in classical compressive sampling theory. The second one, inspired by classical ways of solving the inverse problem of reconstruction, takes an iterative approach to regularise the hypothesis in the direction of the correct result.

[18]  arXiv:2201.09163 [pdf]
Title: Pulmonary Fissure Segmentation in CT Images Based on ODoS Filter and Shape Features
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Priori knowledge of pulmonary anatomy plays a vital role in diagnosis of lung diseases. In CT images, pulmonary fissure segmentation is a formidable mission due to various of factors. To address the challenge, an useful approach based on ODoS filter and shape features is presented for pulmonary fissure segmentation. Here, we adopt an ODoS filter by merging the orientation information and magnitude information to highlight structure features for fissure enhancement, which can effectively distinguish between pulmonary fissures and clutters. Motivated by the fact that pulmonary fissures appear as linear structures in 2D space and planar structures in 3D space in orientation field, an orientation curvature criterion and an orientation partition scheme are fused to separate fissure patches and other structures in different orientation partition, which can suppress parts of clutters. Considering the shape difference between pulmonary fissures and tubular structures in magnitude field, a shape measure approach and a 3D skeletonization model are combined to segment pulmonary fissures for clutters removal. When applying our scheme to 55 chest CT scans which acquired from a publicly available LOLA11 datasets, the median F1-score, False Discovery Rate (FDR), and False Negative Rate (FNR) respectively are 0.896, 0.109, and 0.100, which indicates that the presented method has a satisfactory pulmonary fissure segmentation performance.

[19]  arXiv:2201.09180 [pdf, other]
Title: Prescribed Performance Adaptive Fixed-Time Attitude Tracking Control of a 3-DOF Helicopter with Small Overshoot
Authors: Xidong Wang
Comments: 5 pages, 4 figures
Subjects: Systems and Control (eess.SY)

In this article, a novel prescribed performance adaptive fixed-time backstepping control strategy is investigated for the attitude tracking of a 3-DOF helicopter. First, a new unified barrier function (UBF) is designed to convert the prescribed performance constrained system into an unconstrained one. Then, a fixed-time (FxT) backstepping control framework is established to achieve the attitude tracking. By virtual of a newly proposed inequality, a non-singular virtual control law is constructed. In addition, a FxT differentiator with a compensation mechanism is employed to overcome the matter of "explosion of complexity". Moreover, a modified adaptive law is developed to approximate the upper bound of the disturbances. To obtain a less conservative and more accurate approximation of the settling time, an improved FxT stability theorem is proposed. Based on this theorem, it is proved that all signals of the system are FxT bounded, and the tracking error converges to a preset domain with small overshoot in a user-defined time. Finally, the feasibility and effectiveness of the presented control strategy are confirmed by numerical simulations.

[20]  arXiv:2201.09204 [pdf, ps, other]
Title: $α$-Fairness User Pairing for Downlink NOMA Systems with Imperfect Successive Interference Cancellation
Subjects: Signal Processing (eess.SP)

Non-orthogonal multiple access (NOMA) is considered as one of the predominant multiple access technique for the next-generation cellular networks. We consider a 2-user pair downlink NOMA system with imperfect successive interference cancellation (SIC). We consider bounds on the power allocation factors and then formulate the power allocation as an optimization problem to achieve {$\alpha$-Fairness} among the paired users. We show that {$\alpha$-Fairness} based power allocation factor coincides with lower bound on power allocation factor in case of perfect SIC and $\alpha > 2$. Further, as long as the proposed criterion is satisfied, it converges to the upper bound with increasing imperfection in SIC. Similarly, we show that, for $0<\alpha<1$, the optimal power allocation factor coincides with the derived lower bound on power allocation. Based on these observations, we then propose a low complexity sub-optimal algorithm. Through extensive simulations, we analyse the performance of the proposed algorithm and compare the performance against the state-of-the-art algorithms. We show that even though Near-Far based pairing achieves better fairness than the proposed algorithms, it fails to achieve rates equivalent to its orthogonal multiple access counterparts with increasing imperfections in SIC. Further, we show that the proposed optimal and sub-optimal algorithms achieve significant improvements in terms of fairness as compared to the state-of-the-art algorithms.

[21]  arXiv:2201.09228 [pdf, other]
Title: Inter-Numerology Interference Pre-Equalization for 5G Mixed-Numerology Communications
Comments: 5 pages, 5 figures, Submitted to VTC 2022
Subjects: Signal Processing (eess.SP)

This article proposes a pre-equalization method to remove inter-numerology interference (INI) that occurs in multi-numerology OFDM frame structures of fifth-generation New Radio (5G-NR) and beyond on the transmitter side. In the literature, guard bands, filters and interference cancellation methods are used to reduce the INI. In this work, we mathematically model how the INI is generated and how it can be removed completely for multi-numerology systems on the transmitter side.

[22]  arXiv:2201.09240 [pdf, other]
Title: Learning-Driven Lossy Image Compression; A Comprehensive Survey
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

In the realm of image processing and computer vision (CV), machine learning (ML) architectures are widely applied. Convolutional neural networks (CNNs) solve a wide range of image processing issues and can solve image compression problem. Compression of images is necessary due to bandwidth and memory constraints. Helpful, redundant, and irrelevant information are three different forms of information found in images. This paper aims to survey recent techniques utilizing mostly lossy image compression using ML architectures including different auto-encoders (AEs) such as convolutional auto-encoders (CAEs), variational auto-encoders (VAEs), and AEs with hyper-prior models, recurrent neural networks (RNNs), CNNs, generative adversarial networks (GANs), principal component analysis (PCA) and fuzzy means clustering. We divide all of the algorithms into several groups based on architecture. We cover still image compression in this survey. Various discoveries for the researchers are emphasized and possible future directions for researchers. The open research problems such as out of memory (OOM), striped region distortion (SRD), aliasing, and compatibility of the frameworks with central processing unit (CPU) and graphics processing unit (GPU) simultaneously are explained. The majority of the publications in the compression domain surveyed are from the previous five years and use a variety of approaches.

[23]  arXiv:2201.09245 [pdf, other]
Title: Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Transient stability prediction is critically essential to the fast online assessment and maintaining the stable operation in power systems. The wide deployment of phasor measurement units (PMUs) promotes the development of data-driven approaches for transient stability assessment. This paper proposes the temporal and topological embedding deep neural network (TTEDNN) model to forecast transient stability with the early transient dynamics. The TTEDNN model can accurately and efficiently predict the transient stability by extracting the temporal and topological features from the time-series data of the early transient dynamics. The grid-informed adjacency matrix is used to incorporate the power grid structural and electrical parameter information. The transient dynamics simulation environments under the single-node and multiple-node perturbations are used to test the performance of the TTEDNN model for the IEEE 39-bus and IEEE 118-bus power systems. The results show that the TTEDNN model has the best and most robust prediction performance. Furthermore, the TTEDNN model also demonstrates the transfer capability to predict the transient stability in the more complicated transient dynamics simulation environments.

[24]  arXiv:2201.09247 [pdf]
Title: Enhanced motor imagery-based EEG classification using a discriminative graph Fourier subspace
Subjects: Signal Processing (eess.SP)

Dealing with irregular domains, graph signal processing (GSP) has attracted much attention especially in brain imaging analysis. Motor imagery tasks are extensively utilized in brain-computer interface (BCI) systems that perform classification using features extracted from Electroencephalogram signals. In this paper, a GSP-based approach is presented for two-class motor imagery tasks classification. The proposed method exploits simultaneous diagonalization of two matrices that quantify the covariance structure of graph spectral representation of data from each class, providing a discriminative subspace where distinctive features are extracted from the data. The performance of the proposed method was evaluated on Dataset IVa from BCI Competition III. Experimental results show that the proposed method outperforms two state-of-the-art alternative methods.

[25]  arXiv:2201.09293 [pdf]
Title: Three-dimensional structure from single two-dimensional diffraction intensity measurement
Journal-ref: Phys. Rev. Lett. 127, 063601 (2021)
Subjects: Image and Video Processing (eess.IV); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Computational Physics (physics.comp-ph); Optics (physics.optics)

Conventional three-dimensional (3D) imaging methods require multiple measurements of the sample in different orientation or scanning. When the sample is probed with coherent waves, a single two-dimensional (2D) intensity measurement is sufficient as it contains all the information of the 3D sample distribution. We show a method that allows reconstruction of 3D sample distribution from a single 2D intensity measurement, at the z-resolution exceeding the classical limit. The method can be practical for radiation-sensitive materials, or where the experimental setup allows only one intensity measurement.

[26]  arXiv:2201.09314 [pdf]
Title: Perceptual cGAN for MRI Super-resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Capturing high-resolution magnetic resonance (MR) images is a time consuming process, which makes it unsuitable for medical emergencies and pediatric patients. Low-resolution MR imaging, by contrast, is faster than its high-resolution counterpart, but it compromises on fine details necessary for a more precise diagnosis. Super-resolution (SR), when applied to low-resolution MR images, can help increase their utility by synthetically generating high-resolution images with little additional time. In this paper, we present a SR technique for MR images that is based on generative adversarial networks (GANs), which have proven to be quite useful in generating sharp-looking details in SR. We introduce a conditional GAN with perceptual loss, which is conditioned upon the input low-resolution image, which improves the performance for isotropic and anisotropic MRI super-resolution.

[27]  arXiv:2201.09360 [pdf, other]
Title: POTHER: Patch-Voted Deep Learning-based Chest X-ray Bias Analysis for COVID-19 Detection
Comments: Submitted to International Conference on Computational Science 2022 in London
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A critical step in the fight against COVID-19, which continues to have a catastrophic impact on peoples lives, is the effective screening of patients presented in the clinics with severe COVID-19 symptoms. Chest radiography is one of the promising screening approaches. Many studies reported detecting COVID-19 in chest X-rays accurately using deep learning. A serious limitation of many published approaches is insufficient attention paid to explaining decisions made by deep learning models. Using explainable artificial intelligence methods, we demonstrate that model decisions may rely on confounding factors rather than medical pathology. After an analysis of potential confounding factors found on chest X-ray images, we propose a novel method to minimise their negative impact. We show that our proposed method is more robust than previous attempts to counter confounding factors such as ECG leads in chest X-rays that often influence model classification decisions. In addition to being robust, our method achieves results comparable to the state-of-the-art. The source code and pre-trained weights are publicly available (https://github.com/tomek1911/POTHER).

[28]  arXiv:2201.09375 [pdf, other]
Title: Deep Unrolling for Magnetic Resonance Fingerprinting
Comments: Tech report. To appear in ISBI'2022. arXiv admin note: substantial text overlap with arXiv:2006.15271
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Magnetic Resonance Fingerprinting (MRF) has emerged as a promising quantitative MR imaging approach. Deep learning methods have been proposed for MRF and demonstrated improved performance over classical compressed sensing algorithms. However many of these end-to-end models are physics-free, while consistency of the predictions with respect to the physical forward model is crucial for reliably solving inverse problems. To address this, recently [1] proposed a proximal gradient descent framework that directly incorporates the forward acquisition and Bloch dynamic models within an unrolled learning mechanism. However, [1] only evaluated the unrolled model on synthetic data using Cartesian sampling trajectories. In this paper, as a complementary to [1], we investigate other choices of encoders to build the proximal neural network, and evaluate the deep unrolling algorithm on real accelerated MRF scans with non-Cartesian k-space sampling trajectories.

[29]  arXiv:2201.09376 [pdf, other]
Title: ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Accelerating magnetic resonance image (MRI) reconstruction process is a challenging ill-posed inverse problem due to the excessive under-sampling operation in k-space. In this paper, we propose a recurrent transformer model, namely \textbf{ReconFormer}, for MRI reconstruction which can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data. In particular, the proposed architecture is built upon Recurrent Pyramid Transformer Layers (RPTL), which jointly exploits intrinsic multi-scale information at every architecture unit as well as the dependencies of the deep feature correlation through recurrent states. Moreover, the proposed ReconFormer is lightweight since it employs the recurrent structure for its parameter efficiency. We validate the effectiveness of ReconFormer on multiple datasets with different magnetic resonance sequences and show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency. Implementation code will be available in https://github.com/guopengf/ReconFormer.

[30]  arXiv:2201.09400 [pdf, other]
Title: Fast MRI Reconstruction: How Powerful Transformers Are?
Comments: 5 pages, 5 figures, EMBC 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Magnetic resonance imaging (MRI) is a widely used non-radiative and non-invasive method for clinical interrogation of organ structures and metabolism, with an inherently long scanning time. Methods by k-space undersampling and deep learning based reconstruction have been popularised to accelerate the scanning process. This work focuses on investigating how powerful transformers are for fast MRI by exploiting and comparing different novel network architectures. In particular, a generative adversarial network (GAN) based Swin transformer (ST-GAN) was introduced for the fast MRI reconstruction. To further preserve the edge and texture information, edge enhanced GAN based Swin transformer (EESGAN) and texture enhanced GAN based Swin transformer (TES-GAN) were also developed, where a dual-discriminator GAN structure was applied. We compared our proposed GAN based transformers, standalone Swin transformer and other convolutional neural networks based based GAN model in terms of the evaluation metrics PSNR, SSIM and FID. We showed that transformers work well for the MRI reconstruction from different undersampling conditions. The utilisation of GAN's adversarial structure improves the quality of images reconstructed when undersampled for 30% or higher.

[31]  arXiv:2201.09422 [pdf, ps, other]
Title: Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
Comments: Published in Interspeech 2021, 4808-4812
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data. Diverse conditions of dysarthric speakers account for the acoustic variability, which make the variability difficult to be modeled precisely. This paper presents a variational auto-encoder based variability encoder (VAEVE) to explicitly encode such variability for dysarthric speech. The VAEVE makes use of both phoneme information and low-dimensional latent variable to reconstruct the input acoustic features, thereby the latent variable is forced to encode the phoneme-independent variability. Stochastic gradient variational Bayes algorithm is applied to model the distribution for generating variability encodings, which are further used as auxiliary features for DNN acoustic modeling. Experiment results conducted on the UASpeech corpus show that the VAEVE based variability encodings have complementary effect to the learning hidden unit contributions (LHUC) speaker adaptation. The systems using variability encodings consistently outperform the comparable baseline systems without using them, and" obtain absolute word error rate (WER) reduction by up to 2.2% on dysarthric speech with "Very lowintelligibility level, and up to 2% on the "Mixed" type of dysarthric speech with diverse or uncertain conditions.

[32]  arXiv:2201.09427 [pdf, other]
Title: Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Comments: 5 pages, 2 figures. Accepted to ICASSP2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Although end-to-end text-to-speech (TTS) models can generate natural speech, challenges still remain when it comes to estimating sentence-level phonetic and prosodic information from raw text in Japanese TTS systems. In this paper, we propose a method for polyphone disambiguation (PD) and accent prediction (AP). The proposed method incorporates explicit features extracted from morphological analysis and implicit features extracted from pre-trained language models (PLMs). We use BERT and Flair embeddings as implicit features and examine how to combine them with explicit features. Our objective evaluation results showed that the proposed method improved the accuracy by 5.7 points in PD and 6.0 points in AP. Moreover, the perceptual listening test results confirmed that a TTS system employing our proposed model as a front-end achieved a mean opinion score close to that of synthesized speech with ground-truth pronunciation and accent in terms of naturalness.

[33]  arXiv:2201.09432 [pdf]
Title: Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
Comments: Published in JOURNAL OF INTEGRATION TECHNOLOGY CNKI:SUN:JCJI.0.2015-06-003
Journal-ref: JOURNAL OF INTEGRATION TECHNOLOGY, Vol. 4, No. 6, Nov. 2015
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information was proposed and significantly outperformed the baseline accent independent DNN tandem systems by 0.8%-1.5% absolute (6%-9% relative) in character error rate after sequence level discriminative training and adaptation.

[34]  arXiv:2201.09447 [pdf, ps, other]
Title: Prescribed-Time Safety Design for a Chain of Integrators
Subjects: Systems and Control (eess.SY)

Safety in dynamical systems is commonly pursued using control barrier functions (CBFs) which enforce safety-constraints over the entire duration of a system's evolution. We propose a prescribed-time safety (PTSf) design which enforces safety only for a finite time of interest to the user. While traditional CBF designs would keep the system farther from the barrier than necessary, and longer than necessary, our PTSf design lets the system reach the barrier by the prescribed time and obey the operator's intent thereafter. To emphasize the capability of our design for safety constraints with high relative degrees, we focus our exposition on a chain of integrators where the safety condition is defined for the state furthest from the control input. In contrast to existing CBF-based methods for high-relative degree constraints, our approach involves choosing explicitly specified gains (instead of class $\mathcal{K}$ functions), and, with the aid of backstepping, operates in the entirety of the original safe set with no additional restriction on the initial conditions. With QP being employed in the design, in addition to backstepping and CBFs with a PTSf property, we refer to our design as a QP-backstepping PT-CBF design. For illustration, we include a simulation for the double-integrator system.

[35]  arXiv:2201.09453 [pdf, other]
Title: Novel Nussbaum-Type Function based Safe Adaptive Distributed Consensus Control with Arbitrary Unknown Control Direction
Subjects: Systems and Control (eess.SY)

Existing Nussbaum function based methods on the consensus of multi-agent systems require (partial) identical unknown control directions of all agents and cause dangerous dramatic control shocks. This paper develops a novel saturated Nussbaum function to relax such limitations and proposes a Nussbaum function based control scheme for the consensus problem of multi-agent systems with arbitrary non-identical unknown control directions and safe control progress. First, a novel type of the Nussbaum function with different frequencies is proposed in the form of saturated time-elongation functions, which provides a more smooth and safer transient performance of the control progress. Furthermore, the novel Nussbaum function is employed to design distributed adaptive control algorithms for linearly parameterized multi-agent systems to achieve average consensus cooperatively without dramatic control shocks. Then, under the undirected connected communication topology, all the signals of the closed-loop systems are proved to be bounded and asymptotically convergent. Finally, two comparative numerical simulation examples are carried out to verify the effectiveness and the superiority of the proposed approach with smaller control shock amplitudes than traditional Nussbaum methods.

[36]  arXiv:2201.09455 [pdf]
Title: Mars Entry Trajectory Planning with Range Discretization and Successive Convexification
Journal-ref: AIAA JGCD, 2022
Subjects: Systems and Control (eess.SY)

This paper develops a sequential convex programming approach for Mars entry trajectory planning by range discretization. To improve the accuracy of numerical integration, the range of entry trajectory is selected as the independent variable rather than time or energy. A dilation factor is employed to normalize the entry dynamics and integration interval of the performance index so that the difficult free-final-time programming problem can be converted to a fixed-final-range optimization problem. The bank angle rate with respect to the range is introduced as the new control input in order to decouple the control from the state and facilitate convexification of constraints on the bank angle and its rate. The nonlinear bank angle rate constraint is further relaxed into a linear one via inequality relaxation. Moreover, the nonconvex minimum-time performance index is convexified by regarding flight time as a state variable. Then, the Mars entry trajectory planning problem can be formulated into the framework of convex programming after linearization. By range discretization and successive convexification, the reformulated Mars entry trajectory planning problem is transcribed into a series of convex optimization sub-problems that can be sequentially solved using the convex programming algorithm. The virtual control and adaptive trust-region techniques are employed to improve the accuracy, robustness, and computation efficiency of the algorithm. Numerical simulations with comparative studies are presented to demonstrate the convergence performance and efficiency of the proposed algorithm.

[37]  arXiv:2201.09470 [pdf, other]
Title: Synthetic speech detection using meta-learning with prototypical loss
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)

Recent works on speech spoofing countermeasures still lack generalization ability to unseen spoofing attacks. This is one of the key issues of ASVspoof challenges especially with the rapid development of diverse and high-quality spoofing algorithms. In this work, we address the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unseen test scenario during training. Prototypical loss with metric-learning objectives can learn the embedding space directly and emerges as a strong alternative to prevailing classification loss functions. We propose an anti-spoofing system based on squeeze-excitation Residual network (SE-ResNet) architecture with prototypical loss. We demonstrate that the proposed single system without any data augmentation can achieve competitive performance to the recent best anti-spoofing systems on ASVspoof 2019 logical access (LA) task. Furthermore, the proposed system with data augmentation outperforms the ASVspoof 2021 challenge best baseline both in the progress and evaluation phase of the LA task. On ASVspoof 2019 and 2021 evaluation set LA scenario, we attain a relative 68.4% and 3.6% improvement in min-tDCF compared to the challenge best baselines, respectively.

[38]  arXiv:2201.09494 [pdf, other]
Title: Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition (ASR) system for a target language by pooling speech data from multiple source languages. Exploiting the acoustic similarities between Indian languages, we implement two approaches. In phone/senone mapping, deep neural network (DNN) learns to map senones or phones from one language to the others, and the transcriptions of the source languages are modified such that they can be used along with the target language data to train and fine-tune the target language ASR system. In the other approach, we model the acoustic information for all the languages simultaneously by training a multitask DNN (MTDNN) to predict the senones of each language in different output layers. The cross-entropy loss and the weight update procedure are modified such that only the shared layers and the output layer responsible for predicting the senone classes of a language are updated during training, if the feature vector belongs to that particular language. In the low-resource setting (LRS), 40 hours of transcribed data each for Tamil, Telugu and Gujarati languages are used for training. The DNN based senone mapping technique gives relative improvements in word error rates (WER) of 9.66%, 7.2% and 15.21% over the baseline system for Tamil, Gujarati and Telugu languages, respectively. In medium-resourced setting (MRS), 160, 275 and 135 hours of data for Tamil, Kannada and Hindi languages are used, where, the same technique gives better relative improvements of 13.94%, 10.28% and 27.24% for Tamil, Kannada and Hindi, respectively. The MTDNN with senone mapping based training in LRS, gives higher relative WER improvements of 15.0%, 17.54% and 16.06%, respectively for Tamil, Gujarati and Telugu, whereas in MRS, we see improvements of 21.24% 21.05% and 30.17% for Tamil, Kannada and Hindi languages, respectively.

[39]  arXiv:2201.09499 [pdf, other]
Title: Estimation of Bistatic Radar Detection Performance Under Discrete Clutter Conditions Using Stochastic Geometry
Subjects: Signal Processing (eess.SP)

We propose a metric called the bistatic radar detection coverage probability to evaluate the detection performance of a bistatic radar under discrete clutter conditions. Such conditions are commonly encountered in indoor and outdoor environments where passive radars receivers are deployed with opportunistic illuminators. Backscatter and multipath from the radar environment give rise to ghost targets and point clutter responses in the radar signatures resulting in deterioration in the detection performance. In our work, we model the clutter points as a Poisson point process to account for the diversity in their number and spatial distribution. Using stochastic geometry formulations we provide an analytical framework to estimate the probability that the signal to clutter and noise ratio from a target at any particular position in the bistatic radar plane is above a predefined threshold. Using the metric, we derive key radar system perspectives regarding the radar performance under noise and clutter limited conditions; the range at which the bistatic radar framework can be approximated to a monostatic framework; and the optimal radar transmitted power and bandwidth. Our theoretical results are experimentally validated with Monte Carlo simulations.

[40]  arXiv:2201.09522 [pdf, other]
Title: Accelerated Intravascular Ultrasound Imaging using Deep Reinforcement Learning
Comments: 5 pages, 3 figures, conference
Journal-ref: ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Intravascular ultrasound (IVUS) offers a unique perspective in the treatment of vascular diseases by creating a sequence of ultrasound-slices acquired from within the vessel. However, unlike conventional hand-held ultrasound, the thin catheter only provides room for a small number of physical channels for signal transfer from a transducer-array at the tip. For continued improvement of image quality and frame rate, we present the use of deep reinforcement learning to deal with the current physical information bottleneck. Valuable inspiration has come from the field of magnetic resonance imaging (MRI), where learned acquisition schemes have brought significant acceleration in image acquisition at competing image quality. To efficiently accelerate IVUS imaging, we propose a framework that utilizes deep reinforcement learning for an optimal adaptive acquisition policy on a per-frame basis enabled by actor-critic methods and Gumbel top-$K$ sampling.

[41]  arXiv:2201.09529 [pdf, other]
Title: Small-Signal Stability Analysis of Numerical Integration Methods
Subjects: Systems and Control (eess.SY); Numerical Analysis (math.NA)

The paper provides a novel framework to study the accuracy and stability of numerical integration schemes when employed for the time domain simulation of power systems. A matrix pencil-based approach is adopted to evaluate the error between the dynamic modes of the power system and the modes of the approximated discrete-time system arising from the application of the numerical method. The proposed approach can provide meaningful insights on how different methods compare to each other when applied to a power system, while being general enough to be systematically utilized for, in principle, any numerical method. The framework is illustrated for a handful of well-known explicit and implicit methods, while simulation results are presented based on the WSCC 9-bus system, as well as on a 1, 479-bus dynamic model of the All-Island Irish Transmission System.

[42]  arXiv:2201.09579 [pdf, other]
Title: AutoSeg -- Steering the Inductive Biases for Automatic Pathology Segmentation
Comments: 8 pages, 3 figures, part of the MICCAI MOOD Challenge 2021
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In medical imaging, un-, semi-, or self-supervised pathology detection is often approached with anomaly- or out-of-distribution detection methods, whose inductive biases are not intentionally directed towards detecting pathologies, and are therefore sub-optimal for this task. To tackle this problem, we propose AutoSeg, an engine that can generate diverse artificial anomalies that resemble the properties of real-world pathologies. Our method can accurately segment unseen artificial anomalies and outperforms existing methods for pathology detection on a challenging real-world dataset of Chest X-ray images. We experimentally evaluate our method on the Medical Out-of-Distribution Analysis Challenge 2021.

[43]  arXiv:2201.09586 [pdf, other]
Title: PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Comments: 5 pages, 2 figure, 2 tables, accepted for presentation at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions. Speech recognition-based evaluation was carried out by using real conversational recordings obtained with various smartphones. The proposed model yielded significant gains in word error rate with limited computational cost over systems using a block-online beamformer and a single distant microphone.

[44]  arXiv:2201.09693 [pdf, other]
Title: Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image translation across domains for unpaired datasets has gained interest and great improvement lately. In medical imaging, there are multiple imaging modalities, with very different characteristics. Our goal is to use cross-modality adaptation between CT and MRI whole cardiac scans for semantic segmentation. We present a segmentation network using synthesised cardiac volumes for extremely limited datasets. Our solution is based on a 3D cross-modality generative adversarial network to share information between modalities and generate synthesized data using unpaired datasets. Our network utilizes semantic segmentation to improve generator shape consistency, thus creating more realistic synthesised volumes to be used when re-training the segmentation network. We show that improved segmentation can be achieved on small datasets when using spatial augmentations to improve a generative adversarial network. These augmentations improve the generator capabilities, thus enhancing the performance of the Segmentor. Using only 16 CT and 16 MRI cardiovascular volumes, improved results are shown over other segmentation methods while using the suggested architecture.

[45]  arXiv:2201.09707 [pdf, other]
Title: LCOE-based Pricing for DLT-enabled Local Energy Trading Platforms
Comments: 9 pages, 12 figures
Subjects: Systems and Control (eess.SY)

Support schemes like the Feed-in-Tariff have for many years been an important driver for the deployment of distributed energy resources, and the transition from consumerism to prosumerism. This democratization and decarbonization of the energy system has led to both challenges and opportunities for the system operators, paving the way for emerging concepts like local energy markets. The Feed-in-Tariff approach has often been assumed as the lower economic bound for a prosumer's willingness to participate in such markets but is now being phased out in several countries. In this paper, a new pricing mechanism based on the Levelized Cost of Electricity is proposed, with the intention of securing profitability for the prosumers, as well as creating a transparent and fair price for all market participants. The mechanism is designed to function on a Distributed Ledger Technology-based platform and is further set up from a holistic perspective, defining the market framework as interactions in a Cyber-Physical-Social-System. Schemes based on both fixed and variable contracts with the wholesale supplier are analyzed and compared with the conventional Feed-in-Tariff, showing benefits for both prosumers and consumers.

[46]  arXiv:2201.09716 [pdf]
Title: Pedestrian Dead Reckoning System using Quasi-static Magnetic Field Detection
Subjects: Signal Processing (eess.SP)

Kalman filter-based Inertial Navigation System (INS) is a reliable and efficient method to estimate the position of a pedestrian indoors. Classical INS-based methodology which is called IEZ (INS-EKF-ZUPT) makes use of an Extended Kalman Filter (EKF), a Zero velocity UPdaTing (ZUPT) to calculate the position and attitude of a person. However, heading error which is a key factor of the whole Pedestrian Dead Reckoning (PDR) system is unobservable for IEZ-based PDR system. To minimize the error, Electronic Com-pass (EC) algorithm becomes a valid method. But magnetic disturbance may have a big negative effect on it. In this paper, the Quasi-static Magnetic field Detection (QMD) method is proposed to detect the pure magnetic field and then selects EC algorithm or Heuristic heading Drift Reduction algorithm (HDR) according to the detection result, which implements the complementation of the two methods. Meanwhile, the QMD, EC, and HDR algorithms are integrated into the IEZ framework to form a new PDR solution which is named Advanced IEZ (AIEZ).

[47]  arXiv:2201.09748 [pdf]
Title: A Critical Review of Baseband Architectures for CubeSats Communication Systems
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

Small satellite communications recently entered a period of massive interest driven by the uprising space applications. CubeSats are particularly attractive due to their low development costs which makes them very promising in playing a central role in the global wireless communication sector with numerous applications. Moreover, constellations of CubeSats in low-earth orbits can meet the increasing demands of global-coverage flexible low-cost high-speed connectivity. However, this requires innovative solutions to overcome the significant challenges that face high-data-rate low-power space communications. This paper provides a comprehensive and critical review of the design and architecture of recent CubeSat communication systems with a particular focus on their baseband architectures. The literature is surveyed in detail to identify all baseband design, testing, and demonstration stages as well as accurately describe the systems architecture and communication protocols. The reliability, performance, data rate, and power consumption of the reviewed systems are critically evaluated to understand the limitations of current CubeSat systems and identify directions of future developments. It is concluded that CubeSat communication systems still face many challenges, namely the development of energy-efficient high-speed modems that satisfy CubeSats requirements. Nevertheless, there are several promising directions for improvements such as the use of improved coding algorithms, use of Field Programmable Gate Arrays, multiple access techniques, beamforming, advanced antennas, and transition to higher frequency bands. By providing a concrete summary of current CubeSat communication systems and by critically evaluating their features, limitations, and offering insights about potential improvements, the review should aid CubeSat developers to develop more efficient and high data rate systems.

[48]  arXiv:2201.09805 [pdf, other]
Title: Non-Linear Dynamic Inversion with Actuator Dynamics: an Incremental Control Perspective
Comments: 13 pages, 7 figures
Subjects: Systems and Control (eess.SY)

In this paper, we derive a Nonlinear Dynamic Inversion (NDI) control law for a non-linear system with first order linear actuators, and compare it to Incremental Nonlinear Dynamic Inversion (INDI), which has gained popularity in recent years. It is shown that for first order actuator dynamics, INDI approximates the corresponding NDI control law arbitrarily well under the condition of sufficiently fast actuators. If the actuator bandwidth is low compared to changes in the states, the derived NDI control law has the following advantages compared to INDI: 1) compensation of state derivative terms 2) well defined error dynamics and 3) exact tracking of a reference model, independent of error controller gains in nominal conditions. The comparison of the INDI control law with the well established control design method NDI adds to the understanding of incremental control. It is additionally shown how to quantify the deficiency of the INDI control law with respect to the exact NDI law for actuators with finite bandwidth. The results are confirmed through simulation results of the rolling motion of a fixed wing aircraft.

[49]  arXiv:2201.09816 [pdf]
Title: Pseudospectral continuation for aeroelastic stability analysis
Authors: Arion Pons
Comments: Technical note
Subjects: Systems and Control (eess.SY)

This technical note is concerned with aeroelastic flutter problems: the analysis of aeroelastic systems undergoing airspeed-dependent dynamic instability. Existing continuation methods for parametric stability analysis are based on marching along an airspeed parameter until the flutter point is found - an approach which may waste computational effort on low-airspeed system behavior, before a flutter point is located and characterized. Here, we describe a pseudospectral continuation approach which instead marches outwards from the system's flutter points, from points of instability to points of increasing damping, allowing efficient characterization of the subcritical and supercritical behavior of the system. This approach ties together aeroelastic stability analysis and abstract linear algebra, and provides efficient methods for computing practical aeroelastic stability properties - for instance, flight envelopes based on maximum modal damping, and the location of borderline-stable zones.

[50]  arXiv:2201.09834 [pdf, other]
Title: Deep Decoding of $\ell_\infty$-coded Light Field Images
Subjects: Image and Video Processing (eess.IV)

To enrich the functionalities of traditional cameras, light field cameras record both the intensity and direction of light rays, so that images can be rendered with user-defined camera parameters via computations. The added capability and flexibility are gained at the cost of gathering typically more than $100\times$ greater amount of information than conventional images. To cope with this issue, several light field compression schemes have been introduced. However, their ways of exploiting correlations of multidimensional light field data are complex and are hence not suited for inexpensive light field cameras. In this work, we propose a novel $\ell_\infty$-constrained light-field image compression system that has a very low-complexity DPCM encoder and a CNN-based deep decoder. Targeting high-fidelity reconstruction, the CNN decoder capitalizes on the $\ell_\infty$-constraint and light field properties to remove the compression artifacts and achieves significantly better performance than existing state-of-the-art $\ell_2$-based light field compression methods.

[51]  arXiv:2201.09851 [pdf, other]
Title: Hyperspectral Image Super-resolution with Deep Priors and Degradation Model Inversion
Comments: Proc. IEEE Int. Conf. on Acoust, Speech, Signal Process. (ICASSP), to be published. Manuscript submitted October 6th, 2021; revised January 8th, 2022; accepted January 22nd, 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

To overcome inherent hardware limitations of hyperspectral imaging systems with respect to their spatial resolution, fusion-based hyperspectral image (HSI) super-resolution is attracting increasing attention. This technique aims to fuse a low-resolution (LR) HSI and a conventional high-resolution (HR) RGB image in order to obtain an HR HSI. Recently, deep learning architectures have been used to address the HSI super-resolution problem and have achieved remarkable performance. However, they ignore the degradation model even though this model has a clear physical interpretation and may contribute to improve the performance. We address this problem by proposing a method that, on the one hand, makes use of the linear degradation model in the data-fidelity term of the objective function and, on the other hand, utilizes the output of a convolutional neural network for designing a deep prior regularizer in spectral and spatial gradient domains. Experiments show the performance improvement achieved with this strategy.

[52]  arXiv:2201.09859 [pdf, ps, other]
Title: Graph Reinforcement Learning for Wireless Control Systems: Large-Scale Resource Allocation over Interference Channels
Subjects: Signal Processing (eess.SP)

Modern control systems routinely employ wireless networks to exchange information between spatially distributed plants, actuators and sensors. With wireless networks defined by random, rapidly changing transmission conditions that challenge assumptions commonly held in the design of control systems, proper allocation of communication resources is essential to achieve reliable operation. Designing resource allocation policies, however, is challenging, motivating recent works to successfully exploit deep learning and deep reinforcement learning techniques to design resource allocation and scheduling policies for wireless control systems. As the number of learnable parameters in a neural network grows with the size of the input signal, deep reinforcement learning algorithms may fail to scale, limiting the immediate generalization of such scheduling and resource allocation policies to large-scale systems. The interference and fading patterns among plants and controllers in the network, on the other hand, induce a time-varying communication graph that can be used to construct policy representations based on graph neural networks (GNNs), with the number of learnable parameters now independent of the number of plants in the network. That invariance to the number of nodes is key to design scalable and transferable resource allocation policies, which can be trained with reinforcement learning. Through extensive numerical experiments we show that the proposed graph reinforcement learning approach yields policies that not only outperform baseline solutions and deep reinforcement learning based policies in large-scale systems, but that can also be transferred across networks of varying size.

[53]  arXiv:2201.09867 [pdf]
Title: Importance of Preprocessing in Histopathology Image Classification Using Deep Convolutional Neural Network
Comments: 6 Pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The aim of this study is to propose an alternative and hybrid solution method for diagnosing the disease from histopathology images taken from animals with paratuberculosis and intact intestine. In detail, the hybrid method is based on using both image processing and deep learning for better results. Reliable disease detection from histo-pathology images is known as an open problem in medical image processing and alternative solutions need to be developed. In this context, 520 histopathology images were collected in a joint study with Burdur Mehmet Akif Ersoy University, Faculty of Veterinary Medicine, and Department of Pathology. Manually detecting and interpreting these images requires expertise and a lot of processing time. For this reason, veterinarians, especially newly recruited physicians, have a great need for imaging and computer vision systems in the development of detection and treatment methods for this disease. The proposed solution method in this study is to use the CLAHE method and image processing together. After this preprocessing, the diagnosis is made by classifying a convolutional neural network sup-ported by the VGG-16 architecture. This method uses completely original dataset images. Two types of systems were applied for the evaluation parameters. While the F1 Score was 93% in the method classified without data preprocessing, it was 98% in the method that was preprocessed with the CLAHE method.

[54]  arXiv:2201.09873 [pdf, other]
Title: Transformers in Medical Imaging: A Survey
Comments: 41 pages, \url{this https URL}
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at \url{https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging}.

[55]  arXiv:2201.09875 [pdf, other]
Title: A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.

Cross-lists for Tue, 25 Jan 22

[56]  arXiv:2201.08839 (cross-list from stat.ME) [pdf, ps, other]
Title: Dynamic Infection Spread Model Based Group Testing
Subjects: Methodology (stat.ME); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Signal Processing (eess.SP); Populations and Evolution (q-bio.PE)

We study a dynamic infection spread model, inspired by the discrete time SIR model, where infections are spread via non-isolated infected individuals. While infection keeps spreading over time, a limited capacity testing is performed at each time instance as well. In contrast to the classical, static, group testing problem, the objective in our setup is not to find the minimum number of required tests to identify the infection status of every individual in the population, but to control the infection spread by detecting and isolating the infections over time by using the given, limited number of tests. In order to analyze the performance of the proposed algorithms, we focus on the mean-sense analysis of the number of individuals that remain non-infected throughout the process of controlling the infection. We propose two dynamic algorithms that both use given limited number of tests to identify and isolate the infections over time, while the infection spreads. While the first algorithm is a dynamic randomized individual testing algorithm, in the second algorithm we employ the group testing approach similar to the original work of Dorfman. By considering weak versions of our algorithms, we obtain lower bounds for the performance of our algorithms. Finally, we implement our algorithms and run simulations to gather numerical results and compare our algorithms and theoretical approximation results under different sets of system parameters.

[57]  arXiv:2201.08938 (cross-list from cs.CV) [pdf, other]
Title: Adaptive DropBlock Enhanced Generative Adversarial Networks for Hyperspectral Image Classification
Journal-ref: in IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 6, pp. 5040-5053, June 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

In recent years, hyperspectral image (HSI) classification based on generative adversarial networks (GAN) has achieved great progress. GAN-based classification methods can mitigate the limited training sample dilemma to some extent. However, several studies have pointed out that existing GAN-based HSI classification methods are heavily affected by the imbalanced training data problem. The discriminator in GAN always contradicts itself and tries to associate fake labels to the minority-class samples, and thus impair the classification performance. Another critical issue is the mode collapse in GAN-based methods. The generator is only capable of producing samples within a narrow scope of the data space, which severely hinders the advancement of GAN-based HSI classification methods. In this paper, we proposed an Adaptive DropBlock-enhanced Generative Adversarial Networks (ADGAN) for HSI classification. First, to solve the imbalanced training data problem, we adjust the discriminator to be a single classifier, and it will not contradict itself. Second, an adaptive DropBlock (AdapDrop) is proposed as a regularization method employed in the generator and discriminator to alleviate the mode collapse issue. The AdapDrop generated drop masks with adaptive shapes instead of a fixed size region, and it alleviates the limitations of DropBlock in dealing with ground objects with various shapes. Experimental results on three HSI datasets demonstrated that the proposed ADGAN achieved superior performance over state-of-the-art GAN-based methods. Our codes are available at https://github.com/summitgao/HC_ADGAN

[58]  arXiv:2201.08954 (cross-list from cs.CV) [pdf, other]
Title: Change Detection from Synthetic Aperture Radar Images via Graph-Based Knowledge Supplement Network
Comments: Accepted by IEEE JSTARS
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Synthetic aperture radar (SAR) image change detection is a vital yet challenging task in the field of remote sensing image analysis. Most previous works adopt a self-supervised method which uses pseudo-labeled samples to guide subsequent training and testing. However, deep networks commonly require many high-quality samples for parameter optimization. The noise in pseudo-labels inevitably affects the final change detection performance. To solve the problem, we propose a Graph-based Knowledge Supplement Network (GKSNet). To be more specific, we extract discriminative information from the existing labeled dataset as additional knowledge, to suppress the adverse effects of noisy samples to some extent. Afterwards, we design a graph transfer module to distill contextual information attentively from the labeled dataset to the target dataset, which bridges feature correlation between datasets. To validate the proposed method, we conducted extensive experiments on four SAR datasets, which demonstrated the superiority of the proposed GKSNet as compared to several state-of-the-art baselines. Our codes are available at https://github.com/summitgao/SAR_CD_GKSNet.

[59]  arXiv:2201.08996 (cross-list from cs.CV) [pdf, other]
Title: Linear Array Network for Low-light Image Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Convolution neural networks (CNNs) based methods have dominated the low-light image enhancement tasks due to their outstanding performance. However, the convolution operation is based on a local sliding window mechanism, which is difficult to construct the long-range dependencies of the feature maps. Meanwhile, the self-attention based global relationship aggregation methods have been widely used in computer vision, but these methods are difficult to handle high-resolution images because of the high computational complexity. To solve this problem, this paper proposes a Linear Array Self-attention (LASA) mechanism, which uses only two 2-D feature encodings to construct 3-D global weights and then refines feature maps generated by convolution layers. Based on LASA, Linear Array Network (LAN) is proposed, which is superior to the existing state-of-the-art (SOTA) methods in both RGB and RAW based low-light enhancement tasks with a smaller amount of parameters. The code is released in \url{https://github.com/cuiziteng/LASA_enhancement}.

[60]  arXiv:2201.09001 (cross-list from cs.IT) [pdf, ps, other]
Title: Reconfigurable Intelligent Surfaces with Outdated Channel State Information: Centralized vs. Distributed Deployments
Comments: to appear in IEEE Transactions on Communications, 2022
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we investigate the performance of an RIS-aided wireless communication system subject to outdated channel state information that may operate in both the near- and far-field regions. In particular, we take two RIS deployment strategies into consideration: (i) the centralized deployment, where all the reflecting elements are installed on a single RIS and (ii) the distributed deployment, where the same number of reflecting elements are placed on multiple RISs. For both deployment strategies, we derive accurate closed-form approximations for the ergodic capacity, and we introduce tight upper and lower bounds for the ergodic capacity to obtain useful design insights. From this analysis, we unveil that an increase of the transmit power, the Rician-K factor, the accuracy of the channel state information and the number of reflecting elements help improve the system performance. Moreover, we prove that the centralized RIS-aided deployment may achieve a higher ergodic capacity as compared with the distributed RIS-aided deployment when the RIS is located near the base station or near the user. In different setups, on the other hand, we prove that the distributed deployment outperforms the centralized deployment. Finally, the analytical results are verified by using Monte Carlo simulations.

[61]  arXiv:2201.09031 (cross-list from cs.IT) [pdf, ps, other]
Title: Manifold Optimization Based Multi-user Rate Maximization Aided by Intelligent Reflecting Surface
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Systems and Control (eess.SY)

In this work, two problems associated with a downlink multi-user system are considered with the aid of intelligent reflecting surface (IRS): weighted sum-rate maximization and weighted minimal-rate maximization. For the first problem, a novel DOuble Manifold ALternating Optimization (DOMALO) algorithm is proposed by exploiting the matrix manifold theory and introducing the beamforming matrix and reflection vector using complex sphere manifold and complex oblique manifold, respectively, which incorporate the inherent geometrical structure and the required constraint. A smooth double manifold alternating optimization (S-DOMALO) algorithm is then developed based on the Dinkelbach-type algorithm and smooth exponential penalty function for the second problem. Finally, possible cooperative beamforming gain between IRSs and the IRS phase shift with limited resolution is studied, providing a reference for practical implementation. Numerical results show that our proposed algorithms can significantly outperform the benchmark schemes.

[62]  arXiv:2201.09032 (cross-list from cs.SD) [pdf, other]
Title: NAS-VAD: Neural Architecture Search for Voice Activity Detection
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

The need for automatic design of deep neural networks has led to the emergence of neural architecture search (NAS), which has generated models outperforming manually-designed models. However, most existing NAS frameworks are designed for image processing tasks, and lack structures and operations effective for voice activity detection (VAD) tasks. To discover improved VAD models through automatic design, we present the first work that proposes a NAS framework optimized for the VAD task. The proposed NAS-VAD framework expands the existing search space with the attention mechanism while incorporating the compact macro-architecture with fewer cells. The experimental results show that the models discovered by NAS-VAD outperform the existing manually-designed VAD models in various synthetic and real-world datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.

[63]  arXiv:2201.09057 (cross-list from cs.NI) [pdf, other]
Title: Multi-Agent Reinforcement Learning for Distributed Joint Communication and Computing Resource Allocation over Cell-Free Massive MIMO-enabled Mobile Edge Computing Network
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

To support the newly introduced multimedia services with ultra-low latency and extensive computation requirements, resource-constrained end user devices should utilize the ubiquitous computing resources available at network edge for augmenting on-board (local) processing with edge computing. In this regard, the capability of cell-free massive MIMO to provide reliable access links by guaranteeing uniform quality of service without cell edge can be exploited for seamless parallel processing. Taking this into account, we consider a cell-free massive MIMO-enabled mobile edge network to meet the stringent requirements of the advanced services. For the considered mobile edge network, we formulate a joint communication and computing resource allocation (JCCRA) problem with the objective of minimizing energy consumption of the users while meeting the tight delay constraints. We then propose a fully distributed cooperative solution approach based on multiagent deep deterministic policy gradient (MADDPG) algorithm. The simulation results demonstrate that the performance of the proposed distributed approach has converged to that of a centralized deep deterministic policy gradient (DDPG)-based target benchmark, while alleviating the large overhead associated with the latter. Furthermore, it has been shown that our approach significantly outperforms heuristic baselines in terms of energy efficiency, roughly up to 5 times less total energy consumption.

[64]  arXiv:2201.09071 (cross-list from cs.LG) [pdf, other]
Title: Towards Sustainable Deep Learning for Wireless Fingerprinting Localization
Comments: 6 pages, 6 sections, accepted to ICC 2022
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Location based services, already popular with end users, are now inevitably becoming part of new wireless infrastructures and emerging business processes. The increasingly popular Deep Learning (DL) artificial intelligence methods perform very well in wireless fingerprinting localization based on extensive indoor radio measurement data. However, with the increasing complexity these methods become computationally very intensive and energy hungry, both for their training and subsequent operation. Considering only mobile users, estimated to exceed 7.4billion by the end of 2025, and assuming that the networks serving these users will need to perform only one localization per user per hour on average, the machine learning models used for the calculation would need to perform 65*10^12 predictions per year. Add to this equation tens of billions of other connected devices and applications that rely heavily on more frequent location updates, and it becomes apparent that localization will contribute significantly to carbon emissions unless more energy-efficient models are developed and used. This motivated our work on a new DL-based architecture for indoor localization that is more energy efficient compared to related state-of-the-art approaches while showing only marginal performance degradation. A detailed performance evaluation shows that the proposed model producesonly 58 % of the carbon footprint while maintaining 98.7 % of the overall performance compared to state of the art model external to our group. Additionally, we elaborate on a methodology to calculate the complexity of the DL model and thus the CO2 footprint during its training and operation.

[65]  arXiv:2201.09075 (cross-list from cs.NI) [pdf, ps, other]
Title: Dynamic Channel Access via Meta-Reinforcement Learning
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)

In this paper, we address the channel access problem in a dynamic wireless environment via meta-reinforcement learning. Spectrum is a scarce resource in wireless communications, especially with the dramatic increase in the number of devices in networks. Recently, inspired by the success of deep reinforcement learning (DRL), extensive studies have been conducted in addressing wireless resource allocation problems via DRL. However, training DRL algorithms usually requires a massive amount of data collected from the environment for each specific task and the well-trained model may fail if there is a small variation in the environment. In this work, in order to address these challenges, we propose a meta-DRL framework that incorporates the method of Model-Agnostic Meta-Learning (MAML). In the proposed framework, we train a common initialization for similar channel selection tasks. From the initialization, we show that only a few gradient descents are required for adapting to different tasks drawn from the same distribution. We demonstrate the performance improvements via simulation results.

[66]  arXiv:2201.09110 (cross-list from cs.SD) [pdf, other]
Title: Exploring auditory acoustic features for the diagnosis of the Covid-19
Comments: Accepted in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The current outbreak of a coronavirus, has quickly escalated to become a serious global problem that has now been declared a Public Health Emergency of International Concern by the World Health Organization. Infectious diseases know no borders, so when it comes to controlling outbreaks, timing is absolutely essential. It is so important to detect threats as early as possible, before they spread. After a first successful DiCOVA challenge, the organisers released second DiCOVA challenge with the aim of diagnosing COVID-19 through the use of breath, cough and speech audio samples. This work presents the details of the automatic system for COVID-19 detection using breath, cough and speech recordings. We developed different front-end auditory acoustic features along with a bidirectional Long Short-Term Memory (bi-LSTM) as classifier. The results are promising and have demonstrated the high complementary behaviour among the auditory acoustic features in the Breathing, Cough and Speech tracks giving an AUC of 86.60% on the test set.

[67]  arXiv:2201.09112 (cross-list from cs.RO) [pdf, other]
Title: Neural Network based Interactive Lane Changing Planner in Dense Traffic with Safety Guarantee
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Neural network based planners have shown great promises in improving performance and task success rate in autonomous driving. However, it is very challenging to ensure safety of the system with learning enabled components, especially in dense and highly interactive traffic environments. In this work, we propose a neural network based lane changing planner framework that can ensure safety while sustaining system efficiency. To prevent too conservative planning, we assess the aggressiveness and identify the driving behavior of surrounding vehicles, then adapt the planned trajectory for the ego vehicle accordingly. The ego vehicle can proceed to change lanes if a safe evasion trajectory exists even in the worst case, otherwise, it can hesitate around current lateral position or return back to the original lane. We also quantitatively demonstrate the effectiveness of our planner design and its advantage over other baselines through extensive simulations with diverse and comprehensive experimental settings.

[68]  arXiv:2201.09120 (cross-list from cs.CV) [pdf, other]
Title: Investigating the Potential of Auxiliary-Classifier GANs for Image Classification in Low Data Regimes
Comments: 4 pages content, 1 page references, 3 figures, 2 tables, to appear in ICASSP 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Generative Adversarial Networks (GANs) have shown promise in augmenting datasets and boosting convolutional neural networks' (CNN) performance on image classification tasks. But they introduce more hyperparameters to tune as well as the need for additional time and computational power to train supplementary to the CNN. In this work, we examine the potential for Auxiliary-Classifier GANs (AC-GANs) as a 'one-stop-shop' architecture for image classification, particularly in low data regimes. Additionally, we explore modifications to the typical AC-GAN framework, changing the generator's latent space sampling scheme and employing a Wasserstein loss with gradient penalty to stabilize the simultaneous training of image synthesis and classification. Through experiments on images of varying resolutions and complexity, we demonstrate that AC-GANs show promise in image classification, achieving competitive performance with standard CNNs. These methods can be employed as an 'all-in-one' framework with particular utility in the absence of large amounts of training data.

[69]  arXiv:2201.09130 (cross-list from cs.AI) [pdf]
Title: Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review
Comments: Manuscirpt submitted to Arificial Intelligence Reviews
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Death by suicide is the seventh of the leading death cause worldwide. The recent advancement in Artificial Intelligence (AI), specifically AI application in image and voice processing, has created a promising opportunity to revolutionize suicide risk assessment. Subsequently, we have witnessed fast-growing literature of researches that applies AI to extract audiovisual non-verbal cues for mental illness assessment. However, the majority of the recent works focus on depression, despite the evident difference between depression signs and suicidal behavior non-verbal cues. In this paper, we review the recent works that study suicide ideation and suicide behavior detection through audiovisual feature analysis, mainly suicidal voice/speech acoustic features analysis and suicidal visual cues.

[70]  arXiv:2201.09145 (cross-list from cs.LG) [pdf, other]
Title: glassoformer: a query-sparse transformer for post-fault power grid voltage prediction
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.

[71]  arXiv:2201.09152 (cross-list from cs.CV) [pdf, other]
Title: Generative Adversarial Network Applications in Creating a Meta-Universe
Comments: Computational Science and Computational Intelligence; 2021 International Conference on IEEE CPS (IEEE XPLORE, Scopus), IEEE, 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Generative Adversarial Networks (GANs) are machine learning methods that are used in many important and novel applications. For example, in imaging science, GANs are effectively utilized in generating image datasets, photographs of human faces, image and video captioning, image-to-image translation, text-to-image translation, video prediction, and 3D object generation to name a few. In this paper, we discuss how GANs can be used to create an artificial world. More specifically, we discuss how GANs help to describe an image utilizing image/video captioning methods and how to translate the image to a new image using image-to-image translation frameworks in a theme we desire. We articulate how GANs impact creating a customized world.

[72]  arXiv:2201.09165 (cross-list from cs.MM) [pdf, other]
Title: A Pre-trained Audio-Visual Transformer for Emotion Recognition
Comments: Accepted by IEEE ICASSP 2022
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding. The model aims to capture and extract useful information from the interactions between human facial and auditory behaviors, with application in emotion recognition. We evaluate the model performance on two datasets, namely CREMAD-D (emotion classification) and MSP-IMPROV (continuous emotion regression). Experimental results show that fine-tuning the pre-trained model helps improving emotion classification accuracy by 5-7% and Concordance Correlation Coefficients (CCC) in continuous emotion recognition by 0.03-0.09 compared to the same model trained from scratch. We also demonstrate the robustness of finetuning the pre-trained model in a low-resource setting. With only 10% of the original training set provided, fine-tuning the pre-trained model can lead to at least 10% better emotion recognition accuracy and a CCC score improvement by at least 0.1 for continuous emotion recognition.

[73]  arXiv:2201.09167 (cross-list from cs.CV) [pdf, other]
Title: Mixed X-Ray Image Separation for Artworks with Concealed Designs
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

In this paper, we focus on X-ray images of paintings with concealed sub-surface designs (e.g., deriving from reuse of the painting support or revision of a composition by the artist), which include contributions from both the surface painting and the concealed features. In particular, we propose a self-supervised deep learning-based image separation approach that can be applied to the X-ray images from such paintings to separate them into two hypothetical X-ray images. One of these reconstructed images is related to the X-ray image of the concealed painting, while the second one contains only information related to the X-ray of the visible painting. The proposed separation network consists of two components: the analysis and the synthesis sub-networks. The analysis sub-network is based on learned coupled iterative shrinkage thresholding algorithms (LCISTA) designed using algorithm unrolling techniques, and the synthesis sub-network consists of several linear mappings. The learning algorithm operates in a totally self-supervised fashion without requiring a sample set that contains both the mixed X-ray images and the separated ones. The proposed method is demonstrated on a real painting with concealed content, Do\~na Isabel de Porcel by Francisco de Goya, to show its effectiveness.

[74]  arXiv:2201.09208 (cross-list from cs.CV) [pdf]
Title: Design of Sensor Fusion Driver Assistance System for Active Pedestrian Safety
Comments: The 14th International Conference on Automation Technology (Automation 2017), December 8-10, 2017, Kaohsiung, Taiwan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

In this paper, we present a parallel architecture for a sensor fusion detection system that combines a camera and 1D light detection and ranging (lidar) sensor for object detection. The system contains two object detection methods, one based on an optical flow, and the other using lidar. The two sensors can effectively complement the defects of the other. The accurate longitudinal accuracy of the object's location and its lateral movement information can be achieved simultaneously. Using a spatio-temporal alignment and a policy of sensor fusion, we completed the development of a fusion detection system with high reliability at distances of up to 20 m. Test results show that the proposed system achieves a high level of accuracy for pedestrian or object detection in front of a vehicle, and has high robustness to special environments.

[75]  arXiv:2201.09285 (cross-list from cs.RO) [pdf, other]
Title: Multi-AAV Cooperative Path Planning using Nonlinear Model Predictive Control with Localization Constraints
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In this paper, we solve a joint cooperative localization and path planning problem for a group of Autonomous Aerial Vehicles (AAVs) in GPS-denied areas using nonlinear model predictive control (NMPC). A moving horizon estimator (MHE) is used to estimate the vehicle states with the help of relative bearing information to known landmarks and other vehicles. The goal of the NMPC is to devise optimal paths for each vehicle between a given source and destination while maintaining desired localization accuracy. Estimating localization covariance in the NMPC is computationally intensive, hence we develop an approximate analytical closed form expression based on the relationship between covariance and path lengths to landmarks. Using this expression while computing NMPC commands reduces the computational complexity significantly. We present numerical simulations to validate the proposed approach for different numbers of vehicles and landmark configurations. We also compare the results with EKF-based estimation to show the superiority of the proposed closed form approach.

[76]  arXiv:2201.09318 (cross-list from cs.CV) [pdf, other]
Title: Sparse-view Cone Beam CT Reconstruction using Data-consistent Supervised and Adversarial Learning from Scarce Training Data
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Reconstruction of CT images from a limited set of projections through an object is important in several applications ranging from medical imaging to industrial settings. As the number of available projections decreases, traditional reconstruction techniques such as the FDK algorithm and model-based iterative reconstruction methods perform poorly. Recently, data-driven methods such as deep learning-based reconstruction have garnered a lot of attention in applications because they yield better performance when enough training data is available. However, even these methods have their limitations when there is a scarcity of available training data. This work focuses on image reconstruction in such settings, i.e., when both the number of available CT projections and the training data is extremely limited. We adopt a sequential reconstruction approach over several stages using an adversarially trained shallow network for 'destreaking' followed by a data-consistency update in each stage. To deal with the challenge of limited data, we use image subvolumes to train our method, and patch aggregation during testing. To deal with the computational challenge of learning on 3D datasets for 3D reconstruction, we use a hybrid 3D-to-2D mapping network for the 'destreaking' part. Comparisons to other methods over several test examples indicate that the proposed method has much potential, when both the number of projections and available training data are highly limited.

[77]  arXiv:2201.09323 (cross-list from quant-ph) [pdf, other]
Title: Dual-Frequency Quantum Phase Estimation Mitigates the Spectral Leakage of Quantum Algorithms
Comments: 6 pages, 8 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Systems and Control (eess.SY)

Quantum phase estimation is an important component in diverse quantum algorithms. However, it suffers from spectral leakage, when the reciprocal of the record length is not an integer multiple of the unknown phase, which incurs an accuracy degradation. For the existing single-sample estimation scheme, window-based methods have been proposed for spectral leakage mitigation. As a further advance, we propose a dual-frequency estimator, which asymptotically approaches the Cramer-Rao bound, when multiple samples are available. Numerical results show that the proposed estimator outperforms the existing window-based methods, when the number of samples is sufficiently high.

[78]  arXiv:2201.09351 (cross-list from stat.AP) [pdf]
Title: The risk of bias in denoising methods
Authors: Kendrick Kay (Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota)
Comments: 19 pages, 4 figures
Subjects: Applications (stat.AP); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)

Experimental datasets are growing rapidly in size, scope, and detail, but the value of these datasets is limited by unwanted measurement noise. It is therefore tempting to apply analysis techniques that attempt to reduce noise and enhance signals of interest. In this paper, we draw attention to the possibility that denoising methods may introduce bias and lead to incorrect scientific inferences. To present our case, we first review the basic statistical concepts of bias and variance. Denoising techniques typically reduce variance observed across repeated measurements, but this can come at the expense of introducing bias to the average expected outcome. We then conduct three simple simulations that provide concrete examples of how bias may manifest in everyday situations. These simulations reveal several findings that may be surprising and counterintuitive: (i) different methods can be equally effective at reducing variance but some incur bias while others do not, (ii) identifying methods that better recover ground truth does not guarantee the absence of bias, (iii) bias can arise even if one has specific knowledge of properties of the signal of interest. We suggest that researchers should consider and possibly quantify bias before deploying denoising methods on important research data.

[79]  arXiv:2201.09355 (cross-list from cs.CV) [pdf, ps, other]
Title: Transformer-based SAR Image Despeckling
Comments: Submitted to International Geoscience and Remote Sensing Symposium (IGARSS), 2022. Our code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Synthetic Aperture Radar (SAR) images are usually degraded by a multiplicative noise known as speckle which makes processing and interpretation of SAR images difficult. In this paper, we introduce a transformer-based network for SAR image despeckling. The proposed despeckling network comprises of a transformer-based encoder which allows the network to learn global dependencies between different image regions - aiding in better despeckling. The network is trained end-to-end with synthetically generated speckled images using a composite loss function. Experiments show that the proposed method achieves significant improvements over traditional and convolutional neural network-based despeckling methods on both synthetic and real SAR images.

[80]  arXiv:2201.09382 (cross-list from cs.IT) [pdf, other]
Title: Iterative Joint Parameters Estimation and Decoding in a Distributed Receiver for Satellite Applications and Relevant Cramer-Rao Bounds
Comments: 19 pages, 11 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper presents an algorithm for iterative joint channel parameter (carrier phase, Doppler shift and Doppler rate) estimation and decoding of transmission over channels affected by Doppler shift and Doppler rate using a distributed receiver. This algorithm is derived by applying the sum-product algorithm (SPA) to a factor graph representing the joint a posteriori distribution of the information symbols and channel parameters given the channel output. In this paper, we present two methods for dealing with intractable messages of the sum-product algorithm. In the first approach, we use particle filtering with sequential importance sampling (SIS) for the estimation of the unknown parameters. We also propose a method for fine-tuning of particles for improved convergence. In the second approach, we approximate our model with a random walk phase model, followed by a phase tracking algorithm and polynomial regression algorithm to estimate the unknown parameters. We derive the Weighted Bayesian Cramer-Rao Bounds (WBCRBs) for joint carrier phase, Doppler shift and Doppler rate estimation, which take into account the prior distribution of the estimation parameters and are accurate lower bounds for all considered Signal to Noise Ratio (SNR) values. Numerical results (of bit error rate (BER) and the mean-square error (MSE) of parameter estimation) suggest that phase tracking with the random walk model slightly outperforms particle filtering. However, particle filtering has a lower computational cost than the random walk model based method.

[81]  arXiv:2201.09410 (cross-list from cs.IT) [pdf, other]
Title: Map-Assisted Material Identification at 100 GHz and Above Using Radio Access Technology
Authors: Yi Geng
Comments: Submitted to EUCNC & 6G Summit 2022, 6 pages
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The inclusion of material identification in wireless communication system is an emerging area that offers many opportunities for 6G systems. By using reflected radio wave to determine the material of reflecting surface, not only the performance of 6G networks can be improved, but also some exciting applications can be developed. In this paper, we recap a few prior methods for material identification, then analyze the impact of thickness of reflecting surface on reflection coefficient and present a new concept "settling thickness", which indicates the minimum thickness of reflecting surface to induce steady reflection coefficient. Finally, we propose a novel material identification method based on ray-tracing and 3D-map. Compared to some prior methods that can be implemented in single-bounce-reflection scenario only, we extend the capability of the method to multiple-bounce-reflection scenarios.

[82]  arXiv:2201.09414 (cross-list from cs.IT) [pdf, other]
Title: Generalized Spatially-Coupled Parallel Concatenated Codes With Partial Repetition
Comments: 33 pages, 9 figures, 3 tables. arXiv admin note: text overlap with arXiv:2105.00698
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

A new class of spatially-coupled turbo-like codes (SC-TCs), dubbed generalized spatially coupled parallel concatenated codes (GSC-PCCs), is introduced. These codes are constructed by applying spatial coupling on parallel concatenated codes (PCCs) with a fraction of information bits repeated $q$ times. GSC-PCCs can be seen as a generalization of the original spatially-coupled parallel concatenated codes proposed by Moloudi et al. [2]. To characterize the asymptotic performance of GSC-PCCs, we derive the corresponding density evolution equations and compute their decoding thresholds. The threshold saturation effect is observed and proven. Most importantly, we rigorously prove that any rate-$R$ GSC-PCC ensemble with 2-state convolutional component codes achieves at least a fraction $1-\frac{R}{R+q}$ of the capacity of the binary erasure channel (BEC) for repetition factor $q\geq2$ and this multiplicative gap vanishes as $q$ tends to infinity. To the best of our knowledge, this is the first class of SC-TCs that are proven to be capacity-achieving. Further, the connection between the strength of the component codes, the decoding thresholds of GSC-PCCs, and the repetition factor are established. The superiority of the proposed codes with finite blocklength is exemplified by comparing their error performance with that of existing SC-TCs via computer simulations.

[83]  arXiv:2201.09415 (cross-list from cs.IT) [pdf, other]
Title: Sub-Block Rearranged Staircase Codes
Authors: Min Qiu, Jinhong Yuan
Comments: 32 pages, 6 figures, 2 tables
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a new family of spatially coupled product codes, called sub-block rearranged staircase (SR-staircase) codes. Each SR-staircase code block is constructed by encoding rearranged preceding code blocks, where the rearrangement involves sub-blocks decomposition and transposition. The major advantage of the proposed construction over the conventional staircase construction is that it enables to employ stronger algebraic component codes to achieve better waterfall and error floor performance with lower miscorrection probability under low-complexity iterative bounded distance decoding (iBDD) while having the same code rate and similar blocklength as the conventional staircase codes. We characterize the decoding threshold of the proposed codes under iBDD by using density evolution and also derive the conditions under which they achieve a better decoding threshold than that of the conventional staircase codes. Further, we investigate the error floor performance by analyzing the contributing error patterns and their multiplicities. Both theoretical and simulation results show that SR-staircase codes outperform the conventional staircase codes in terms of waterfall and error floor while the performance can be further improved by using a large coupling width.

[84]  arXiv:2201.09421 (cross-list from cs.CV) [pdf, other]
Title: Mutual Attention-based Hybrid Dimensional Network for Multimodal Imaging Computer-aided Diagnosis
Comments: 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recent works on Multimodal 3D Computer-aided diagnosis have demonstrated that obtaining a competitive automatic diagnosis model when a 3D convolution neural network (CNN) brings more parameters and medical images are scarce remains nontrivial and challenging. Considering both consistencies of regions of interest in multimodal images and diagnostic accuracy, we propose a novel mutual attention-based hybrid dimensional network for MultiModal 3D medical image classification (MMNet). The hybrid dimensional network integrates 2D CNN with 3D convolution modules to generate deeper and more informative feature maps, and reduce the training complexity of 3D fusion. Besides, the pre-trained model of ImageNet can be used in 2D CNN, which improves the performance of the model. The stereoscopic attention is focused on building rich contextual interdependencies of the region in 3D medical images. To improve the regional correlation of pathological tissues in multimodal medical images, we further design a mutual attention framework in the network to build the region-wise consistency in similar stereoscopic regions of different image modalities, providing an implicit manner to instruct the network to focus on pathological tissues. MMNet outperforms many previous solutions and achieves results competitive to the state-of-the-art on three multimodal imaging datasets, i.e., Parotid Gland Tumor (PGT) dataset, the MRNet dataset, and the PROSTATEx dataset, and its advantages are validated by extensive experiments.

[85]  arXiv:2201.09429 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Neural Audio Coding for Real-Time Communications
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Deep-learning based methods have shown their advantages inaudio coding over traditional ones but limited attention hasbeen paid on real-time communications (RTC). This paperproposes the TFNet, an end-to-end neural audio codec withlow latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that seldom being investigated in audiocoding. An interleaved structure is proposed for temporalfiltering to capture both short-term and long-term temporaldependencies. Furthermore, with end-to-end optimization,the TFNet is jointly optimized with speech enhancement andpacket loss concealment, yielding a one-for-all network forthree tasks. Both subjective and objective results demonstratethe efficiency of the proposed TFNet.

[86]  arXiv:2201.09436 (cross-list from cs.IT) [pdf, other]
Title: OptM3Sec: Optimizing Multicast IRS-Aided Multiantenna DFRC Secrecy Channel with Multiple Eavesdroppers
Comments: 5 pages, 2 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

With the use of common signaling methods for dual-function radar-communications (DFRC) systems, the susceptibility of eavesdropping on messages aimed at legitimate users has worsened. For DFRC systems, the radar target may act as an eavesdropper (ED) that receives a high-energy signal thereby leading to additional challenges. Unlike prior works, we consider a multicast multi-antenna DFRC system with multiple EDs. We then propose a physical layer design approach to maximize the secrecy rate by installing intelligent reflecting surfaces in the radar channels. Our optimization of multiple ED multicast multi-antenna DFRC secrecy rate (OptM3Sec) approach solves this highly nonconvex problem with respect to the precoding matrices. Our numerical experiments demonstrate the feasibility of our algorithm in maximizing the secrecy rate in this DFRC setup.

[87]  arXiv:2201.09458 (cross-list from cs.RO) [pdf]
Title: Hybrid Adaptive Control for Series Elastic Actuator of Humanoid Robot
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Generally, humanoid robots usually suffer significant impact force when walking or running in a non-predefined environment that could easily damage the actuators due to high stiffness. In recent years, the usages of passive, compliant series elastic actuators (SEA) for driving humanoid's joints have proved the capability in many aspects so far. However, despite being widely applied in the biped robot research field, the stable control problem for a humanoid powered by the SEAs, especially in the walking process, is still a challenge. This paper proposes a model reference adaptive control (MRAC) combined with the backstepping algorithm to deal with the parameter uncertainties in a humanoid's lower limb driven by the SEA system. This is also an extension of our previous research (Lanh et al.,2021). Firstly, a dynamic model of SEA is obtained. Secondly, since there are unknown and uncertain parameters in the SEA model, a model reference adaptive controller (MRAC) is employed to guarantee the robust performance of the humanoid's lower limb. Finally, an experiment is carried out to evaluate the effectiveness of the proposed controller and the SEA mechanism.

[88]  arXiv:2201.09461 (cross-list from math.OC) [pdf, ps, other]
Title: A Partially Distributed Fixed-Time Economic Dispatch Algorithm with Kron's Modeled Power Transmission Losses
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

A partially distributed economic dispatch algorithm, which renders optimal value in fixed time with the objective of supplying the load requirement as well as the power transmission losses, is proposed in this paper. The transmission losses are modeled using Kron's $\mathcal{B}-$loss formula, under a standard assumption on the values of $\mathcal{B}-$coefficients. The total power supplied by the generators is subjected to time-varying equality constraints due to time-varying nature of the transmission losses. Using Lyapunov and optimization theory, we rigorously prove the convergence of the proposed algorithm and show that the optimal value of power is reached within a fixed-time, whose upper bound dependents on the values of $\mathcal{B}-$coefficients, parameters characterizing the convexity of the cost functions associated with each generator and the interaction topology among them. Finally, an example is simulated to illustrate the theoretical results.

[89]  arXiv:2201.09472 (cross-list from cs.SD) [pdf, other]
Title: Disentangling Style and Speaker Attributes for TTS Style Transfer
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

End-to-end neural TTS has shown improved performance in speech style transfer. However, the improvement is still limited by the available training data in both target styles and speakers. Additionally, degenerated performance is observed when the trained TTS tries to transfer the speech to a target style from a new speaker with an unknown, arbitrary style. In this paper, we propose a new approach to seen and unseen style transfer training on disjoint, multi-style datasets, i.e., datasets of different styles are recorded, one individual style by one speaker in multiple utterances. An inverse autoregressive flow (IAF) technique is first introduced to improve the variational inference for learning an expressive style representation. A speaker encoder network is then developed for learning a discriminative speaker embedding, which is jointly trained with the rest neural TTS modules. The proposed approach of seen and unseen style transfer is effectively trained with six specifically-designed objectives: reconstruction loss, adversarial loss, style distortion loss, cycle consistency loss, style classification loss, and speaker classification loss. Experiments demonstrate, both objectively and subjectively, the effectiveness of the proposed approach for seen and unseen style transfer tasks. The performance of our approach is superior to and more robust than those of four other reference systems of prior art.

[90]  arXiv:2201.09483 (cross-list from cs.LG) [pdf, other]
Title: A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Signal Processing (eess.SP); Machine Learning (stat.ML)

IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude.

[91]  arXiv:2201.09486 (cross-list from cs.SD) [pdf, other]
Title: Bias in Automated Speaker Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition technologies are deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including model building, implementation, and data generation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.

[92]  arXiv:2201.09493 (cross-list from cs.CR) [pdf, other]
Title: STRIDE-based Cyber Security Threat Modeling for IoT-enabled Precision Agriculture Systems
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of resources. Undoubtedly the success of the system would be impacted on crop productivity, where failure would impact severely. Like many other cyber-physical systems, one of the growing challenges to avoid system adversity is to ensure the system's security, privacy, and trust. But what are the vulnerabilities, threats, and security issues we should consider while deploying precision agriculture? This paper has conducted a holistic threat modeling on component levels of precision agriculture's standard infrastructure using popular threat intelligence tools STRIDE to identify common security issues. Our modeling identifies a noticing of fifty-eight potential security threats to consider. This presentation systematically presented them and advised general mitigation suggestions to support cyber security in precision agriculture.

[93]  arXiv:2201.09507 (cross-list from cs.IT) [pdf, ps, other]
Title: Beamforming Towards Seamless Sensing Coverage for Cellular Integrated Sensing and Communication
Comments: 6 Pages
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The sixth generation (6G) mobile communication networks are expected to offer a new paradigm of cellular integrated sensing and communication (ISAC). However, due to the intrinsic difference between sensing and communication in terms of coverage requirement, current cellular networks that are deliberately planned mainly for communication coverage are difficult to achieve seamless sensing coverage. To address this issue, this paper studies the beamforming optimization towards seamless sensing coverage for a basic bi-static ISAC system, while ensuring that the communication requirements of multiple users equipment (UEs) are satisfied. Towards this end, an optimization problem is formulated to maximize the worst-case sensing signal-to-noise ratio (SNR) in a prescribed coverage region, subject to the signal-to-interference-plus-noise ratio (SINR) requirement for each UE. To gain some insights, we first investigate the special case with one single UE and one single sensing point, for which a closed-from expression of the optimal beamforming is obtained. For the general case with multiple communication UEs and contiguous regional sensing coverage, an efficient algorithm based on successive convex approximation (SCA) is proposed to solve the non-convex beamforming optimization problem. Numerical results demonstrate that the proposed design is able to achieve seamless sensing coverage in the prescribed region, while guaranteeing the communication requirements of the UEs.

[94]  arXiv:2201.09528 (cross-list from cs.IT) [pdf, ps, other]
Title: Robust Trajectory and Communication Design in IRS-Assisted UAV Communication under Malicious Jamming
Comments: This paper studied the joint design of UAV trajectory and IRS passive beamforming in IRS-aided UAV communication in presence of a jammer, whose location is unknown
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we study an unmanned aerial vehicle (UAV) communication system, where a ground node (GN) communicate with a UAV assisted by intelligent reflecting surface (IRS) in the presence of a jammer with imperfect location information. We aim to improve the achievable average rate via the joint robust design of UAV trajectory, IRS passive beamforming and GN's power allocation. However, the formulated optimization problem is challenging to solve due to its non-convexity and coupled variables. To overcome the difficulty, we propose an alternating optimization (AO) based algorithm to solve it sub-optimally by leveraging semidefinite relaxation (SDR), successive convex approximation (SCA), and S-procedure methods. Simulation results show that by deploying the IRS near the GN, our proposed algorithm always improves the uplink achievable average rate significantly compared with the benchmark algorithms, while deploying the IRS nearby the jammer is effective only when the jammer's location is perfectly known.

[95]  arXiv:2201.09531 (cross-list from cs.LG) [pdf, other]
Title: Communication-Efficient Stochastic Zeroth-Order Optimization for Federated Learning
Comments: 13 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Federated learning (FL), as an emerging edge artificial intelligence paradigm, enables many edge devices to collaboratively train a global model without sharing their private data. To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order to second-order methods. However, these algorithms cannot be applied in scenarios where the gradient information is not available, e.g., federated black-box attack and federated hyperparameter tuning. To address this issue, in this paper we propose a derivative-free federated zeroth-order optimization (FedZO) algorithm featured by performing multiple local updates based on stochastic gradient estimators in each communication round and enabling partial device participation. Under the non-convex setting, we derive the convergence performance of the FedZO algorithm and characterize the impact of the numbers of local iterates and participating edge devices on the convergence. To enable communication-efficient FedZO over wireless networks, we further propose an over-the-air computation (AirComp) assisted FedZO algorithm. With an appropriate transceiver design, we show that the convergence of AirComp-assisted FedZO can still be preserved under certain signal-to-noise ratio conditions. Simulation results demonstrate the effectiveness of the FedZO algorithm and validate the theoretical observations.

[96]  arXiv:2201.09562 (cross-list from cs.LG) [pdf, other]
Title: Scalable Safe Exploration for Global Optimization of Dynamical Systems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for complex systems while giving safety and optimality guarantees. Our experiments on a robot arm that would be prohibitive for GoSafe demonstrate that GoSafeOpt safely finds remarkably better policies than competing safe learning methods for high-dimensional domains.

[97]  arXiv:2201.09563 (cross-list from cs.CV) [pdf]
Title: Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection
Comments: 32 pages, 17 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close cropping, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In stripping chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.

[98]  arXiv:2201.09592 (cross-list from cs.SD) [pdf, other]
Title: Unsupervised Audio Source Separation Using Differentiable Parametric Source Models
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised model-based deep learning approach to musical source separation. Each source is modelled with a differentiable parametric source-filter model. A neural network is trained to reconstruct the observed mixture as a sum of the sources by estimating the source models' parameters given their fundamental frequencies. At test time, soft masks are obtained from the synthesized source signals. The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods based on nonnegative matrix factorization and a supervised deep learning baseline. Integrating domain knowledge in the form of source models into a data-driven method leads to high data efficiency: the proposed approach achieves good separation quality even when trained on less than three minutes of audio. This work makes powerful deep learning based separation usable in scenarios where training data with ground truth is expensive or nonexistent.

[99]  arXiv:2201.09595 (cross-list from cs.HC) [pdf, other]
Title: Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Journal-ref: MuCAI'21: Proceedings of the 2nd ACM Multimedia Workshop on Multimodal Conversational AI, 2021
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment.

[100]  arXiv:2201.09598 (cross-list from math.OC) [pdf, other]
Title: On the Optimization Landscape of Dynamical Output Feedback Linear Quadratic Control
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

The optimization landscape of optimal control problems plays an important role in the convergence of many policy gradient methods. Unlike state-feedback Linear Quadratic Regulator (LQR), static output-feedback policies are typically insufficient to achieve good closed-loop control performance. We investigate the optimization landscape of linear quadratic control using dynamical output feedback policies, denoted as dynamical LQR (dLQR) in this paper. We first show that the dLQR cost varies with similarity transformations. We then derive an explicit form of the optimal similarity transformation for a given observable stabilizing controller. We further characterize the unique observable stationary point of dLQR. This provides an optimality certificate for policy gradient methods under mild assumptions. Finally, we discuss the differences and connections between dLQR and the canonical linear quadratic Gaussian (LQG) control. These results shed light on designing policy gradient algorithms for decision-making problems with partially observed information.

[101]  arXiv:2201.09613 (cross-list from cs.CV) [pdf, other]
Title: SEN12MS-CR-TS: A Remote Sensing Data Set for Multi-modal Multi-temporal Cloud Removal
Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

About half of all optical observations collected via spaceborne satellites are affected by haze or clouds. Consequently, cloud coverage affects the remote sensing practitioner's capabilities of a continuous and seamless monitoring of our planet. This work addresses the challenge of optical satellite image reconstruction and cloud removal by proposing a novel multi-modal and multi-temporal data set called SEN12MS-CR-TS. We propose two models highlighting the benefits and use cases of SEN12MS-CR-TS: First, a multi-modal multi-temporal 3D-Convolution Neural Network that predicts a cloud-free image from a sequence of cloudy optical and radar images. Second, a sequence-to-sequence translation model that predicts a cloud-free time series from a cloud-covered time series. Both approaches are evaluated experimentally, with their respective models trained and tested on SEN12MS-CR-TS. The conducted experiments highlight the contribution of our data set to the remote sensing community as well as the benefits of multi-modal and multi-temporal information to reconstruct noisy information. Our data set is available at https://patrickTUM.github.io/cloud_removal

[102]  arXiv:2201.09622 (cross-list from cs.IT) [pdf, ps, other]
Title: Uplink Performance of High-Mobility Cell-Free Massive MIMO-OFDM Systems
Comments: Accepted in IEEE ICC 2022
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

High-speed train (HST) communications with orthogonal frequency division multiplexing (OFDM) techniques have received significant attention in recent years. Besides, cell-free (CF) massive multiple-input multiple-output (MIMO) is considered a promising technology to achieve the ultimate performance limit. In this paper, we focus on the performance of CF massive MIMO-OFDM systems with both matched filter and large-scale fading decoding (LSFD) receivers in HST communications. HST communications with small cell and cellular massive MIMO-OFDM systems are also analyzed for comparison. Considering the bad effect of Doppler frequency offset (DFO) on system performance, exact closed-form expressions for uplink spectral efficiency (SE) of all systems are derived. According to the simulation results, we find that the CF massive MIMO-OFDM system with LSFD achieves both larger SE and lower SE drop percentages than other systems. In addition, increasing the number of access points (APs) and antennas per AP can effectively compensate for the performance loss from the DFO. Moreover, there is an optimal vertical distance between APs and HST to achieve the maximum SE.

[103]  arXiv:2201.09685 (cross-list from cs.IT) [pdf, other]
Title: Robust Joint Design for Intelligent Reflecting Surfaces Assisted Cell-Free Networks
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Intelligent reflecting surfaces (IRSs) have emerged as a promising economical solution to implement cell-free networks. However, the performance gains achieved by IRSs critically depend on smartly tuned passive beamforming based on the assumption that the accurate channel state information (CSI) knowledge is available at the central processing unit (CPU), which is practically impossible. Thus, in this paper, we investigate the impact of the CSI uncertainty on IRS-assisted cell-free networks. We adopt a stochastic method to cope with the CSI uncertainty by maximizing the expectation of the sum-rate, which guarantees the robust performance over the average. Accordingly, an average sum-rate maximization problem is formulated, which is non-convex and arduous to obtain its optimal solution due to the coupled variables and the expectation operation with respect to CSI uncertainties. As a compromising approach, we develop an efficient robust joint design algorithm with low-complexity. Particularly, the original problem is equivalently transformed into a tractable form by employing algebraic manipulations. Then, the locally optimal solution can be obtained iteratively. We further prove that the CSI uncertainty has no direct impact on the optimizing of the passive reflecting beamforming. It is worth noting that the investigated scenario is flexible and general in communications, thus the proposed algorithm can act as a general framework to solve various sum-rate maximization problems. Simulation results demonstrate that IRSs can achieve considerable data rate improvement for conventional cell-free networks, and confirm the resilience of the proposed algorithm against the CSI uncertainty.

[104]  arXiv:2201.09692 (cross-list from cs.SD) [pdf, ps, other]
Title: Improving Factored Hybrid HMM Acoustic Modeling without State Tying
Comments: Accepted for presentation at IEEE ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this work, we show that a factored hybrid hidden Markov model (FH-HMM) which is defined without any phonetic state-tying outperforms a state-of-the-art hybrid HMM. The factored hybrid HMM provides a link to transducer models in the way it models phonetic (label) context while preserving the strict separation of acoustic and language model of the hybrid HMM approach. Furthermore, we show that the factored hybrid model can be trained from scratch without using phonetic state-tying in any of the training steps. Our modeling approach enables triphone context while avoiding phonetic state-tying by a decomposition into locally normalized factored posteriors for monophones/HMM states in phoneme context. Experimental results are provided for Switchboard 300h and LibriSpeech. On the former task we also show that by avoiding the phonetic state-tying step, the factored hybrid can take better advantage of regularization techniques during training, compared to the standard hybrid HMM with phonetic state-tying based on classification and regression trees (CART).

[105]  arXiv:2201.09699 (cross-list from cs.LG) [pdf, other]
Title: EASY: Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Image and Video Processing (eess.IV)

Few-shot learning aims at leveraging knowledge learned by one or more deep learning models, in order to obtain good classification performance on new problems, where only a few labeled samples per class are available. Recent years have seen a fair number of works in the field, introducing methods with numerous ingredients. A frequent problem, though, is the use of suboptimally trained models to extract knowledge, leading to interrogations on whether proposed approaches bring gains compared to using better initial models without the introduced ingredients. In this work, we propose a simple methodology, that reaches or even beats state of the art performance on multiple standardized benchmarks of the field, while adding almost no hyperparameters or parameters to those used for training the initial deep learning models on the generic dataset. This methodology offers a new baseline on which to propose (and fairly compare) new techniques or adapt existing ones.

[106]  arXiv:2201.09709 (cross-list from cs.SD) [pdf, other]
Title: Optimizing Tandem Speaker Verification and Anti-Spoofing Systems
Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Published version available at: this https URL
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset in a constrained setting.

[107]  arXiv:2201.09717 (cross-list from cs.CV) [pdf, other]
Title: Keeping Deep Lithography Simulators Updated: Global-Local Shape-Based Novelty Detection and Active Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Learning-based pre-simulation (i.e., layout-to-fabrication) models have been proposed to predict the fabrication-induced shape deformation from an IC layout to its fabricated circuit. Such models are usually driven by pairwise learning, involving a training set of layout patterns and their reference shape images after fabrication. However, it is expensive and time-consuming to collect the reference shape images of all layout clips for model training and updating. To address the problem, we propose a deep learning-based layout novelty detection scheme to identify novel (unseen) layout patterns, which cannot be well predicted by a pre-trained pre-simulation model. We devise a global-local novelty scoring mechanism to assess the potential novelty of a layout by exploiting two subnetworks: an autoencoder and a pretrained pre-simulation model. The former characterizes the global structural dissimilarity between a given layout and training samples, whereas the latter extracts a latent code representing the fabrication-induced local deformation. By integrating the global dissimilarity with the local deformation boosted by a self-attention mechanism, our model can accurately detect novelties without the ground-truth circuit shapes of test samples. Based on the detected novelties, we further propose two active-learning strategies to sample a reduced amount of representative layouts most worthy to be fabricated for acquiring their ground-truth circuit shapes. Experimental results demonstrate i) our method's effectiveness in layout novelty detection, and ii) our active-learning strategies' ability in selecting representative novel layouts for keeping a learning-based pre-simulation model updated.

[108]  arXiv:2201.09725 (cross-list from cs.LG) [pdf]
Title: Machine Learning Algorithms for Prediction of Penetration Depth and Geometrical Analysis of Weld in Friction Stir Spot Welding Process
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Nowadays, manufacturing sectors harness the power of machine learning and data science algorithms to make predictions for the optimization of mechanical and microstructure properties of fabricated mechanical components. The application of these algorithms reduces the experimental cost beside leads to reduce the time of experiments. The present research work is based on the prediction of penetration depth using Supervised Machine Learning algorithms such as Support Vector Machines (SVM), Random Forest Algorithm, and Robust Regression algorithm. A Friction Stir Spot Welding (FSSW) was used to join two elements of AA1230 aluminum alloys. The dataset consists of three input parameters: Rotational Speed (rpm), Dwelling Time (seconds), and Axial Load (KN), on which the machine learning models were trained and tested. It observed that the Robust Regression machine learning algorithm outperformed the rest of the algorithms by resulting in the coefficient of determination of 0.96. The research work also highlights the application of image processing techniques to find the geometrical features of the weld formation.

[109]  arXiv:2201.09739 (cross-list from cs.LG) [pdf, other]
Title: Dense Air Quality Maps Using Regressive Facility Location Based Drive By Sensing
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Currently, fixed static sensing is a primary way to monitor environmental data like air quality in cities. However, to obtain a dense spatial coverage, a large number of static monitors are required, thereby making it a costly option. Dense spatiotemporal coverage can be achieved using only a fraction of static sensors by deploying them on the moving vehicles, known as the drive by sensing paradigm. The redundancy present in the air quality data can be exploited by processing the sparsely sampled data to impute the remaining unobserved data points using the matrix completion techniques. However, the accuracy of imputation is dependent on the extent to which the moving sensors capture the inherent structure of the air quality matrix. Therefore, the challenge is to pick those set of paths (using vehicles) that perform representative sampling in space and time. Most works in the literature for vehicle subset selection focus on maximizing the spatiotemporal coverage by maximizing the number of samples for different locations and time stamps which is not an effective representative sampling strategy. We present regressive facility location-based drive by sensing, an efficient vehicle selection framework that incorporates the smoothness in neighboring locations and autoregressive time correlation while selecting the optimal set of vehicles for effective spatiotemporal sampling. We show that the proposed drive by sensing problem is submodular, thereby lending itself to a greedy algorithm but with performance guarantees. We evaluate our framework on selecting a subset from the fleet of public transport in Delhi, India. We illustrate that the proposed method samples the representative spatiotemporal data against the baseline methods, reducing the extrapolation error on the simulated air quality data. Our method, therefore, has the potential to provide cost effective dense air quality maps.

[110]  arXiv:2201.09759 (cross-list from cs.NE) [pdf, other]
Title: Exploration of Hyperdimensional Computing Strategies for Enhanced Learning on Epileptic Seizure Detection
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP)

Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applications. But in the case of epilepsy detection, standard HD computing is not performing at the level of other state-of-the-art algorithms. This could be due to the inherent complexity of the seizures and their signatures in different biosignals, such as the electroencephalogram (EEG), the highly personalized nature, and the disbalance of seizure and non-seizure instances. In the literature, different strategies for improved learning of HD computing have been proposed, such as iterative (multi-pass) learning, multi-centroid learning and learning with sample weight ("OnlineHD"). Yet, most of them have not been tested on the challenging task of epileptic seizure detection, and it stays unclear whether they can increase the HD computing performance to the level of the current state-of-the-art algorithms, such as random forests. Thus, in this paper, we implement different learning strategies and assess their performance on an individual basis, or in combination, regarding detection performance and memory and computational requirements. Results show that the best-performing algorithm, which is a combination of multi-centroid and multi-pass, can indeed reach the performance of the random forest model on a highly unbalanced dataset imitating a real-life epileptic seizure detection application.

[111]  arXiv:2201.09786 (cross-list from cs.NI) [pdf, other]
Title: Aerial Energy Provisioning for Massive Energy-Constrained IoT by UAVs
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Autonomy of devices is a major challenge in many Internet of Things (IoT) applications, in particular when the nodes are deployed remotely or difficult to assess places. In this paper we present an approach to provide energy to these devices by Unmanned Aerial Vehicles (UAVs). Therefore, the two major challenges, finding and charging the node are presented. We propose a model to give the energy constrained node an unlimited autonomy by taken the Wireless Power Transfer (WPT) link and battery capacity into account. Selecting the most suitable battery technology allows a reduction in battery capacity and waste. Moreover, an upgrade of existing IoT nodes is feasible with a limited impact on the design and form factor.

Replacements for Tue, 25 Jan 22

[112]  arXiv:2005.09082 (replaced) [pdf]
Title: Quantum Noise of Kramers-Kronig Receiver
Subjects: Quantum Physics (quant-ph); Signal Processing (eess.SP)
[113]  arXiv:2006.14774 (replaced) [pdf, other]
Title: Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications
Comments: 15 pages, 6 figures
Subjects: Signal Processing (eess.SP); Information Retrieval (cs.IR)
[114]  arXiv:2007.14730 (replaced) [pdf, ps, other]
Title: Amplify-and-Forward Relaying for Hierarchical Over-the-Air Computation
Comments: Hierarchical AirComp Design with AF relaying over a large area; Full paper, 30 pages, and 6 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[115]  arXiv:2008.02159 (replaced) [pdf, other]
Title: Learning from Sparse Demonstrations
Comments: A real world drone navigation experiment using the proposed method is at the link: this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[116]  arXiv:2009.10454 (replaced) [pdf, ps, other]
Title: Solving Dynamic Optimization Problems to a Specified Accuracy: An Alternating Approach using Integrated Residuals
Comments: 10 pages, 7 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[117]  arXiv:2011.05313 (replaced) [pdf, other]
Title: Understanding the physics of coherent LiDAR
Comments: 36 pages,10 figures
Subjects: Applied Physics (physics.app-ph); Image and Video Processing (eess.IV)
[118]  arXiv:2012.01986 (replaced) [pdf, other]
Title: An Improved Iterative Neural Network for High-Quality Image-Domain Material Decomposition in Dual-Energy CT
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[119]  arXiv:2101.08987 (replaced) [pdf, other]
Title: Progressive Image Super-Resolution via Neural Differential Equation
Comments: Revision on the title, abstract and main text; Remove figures to fit 4 pages; Initial accepted version of ICASSP 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[120]  arXiv:2102.02828 (replaced) [pdf, other]
Title: Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs
Comments: 18 pages, 6 figures, accepted by ICLR, code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[121]  arXiv:2103.02313 (replaced) [pdf]
Title: Open community platform for hearing aid algorithm research: open Master Hearing Aid (openMHA)
Comments: 10 pages, 5 figures
Journal-ref: SoftwareX, Volume 17, 2022, 100953, ISSN 2352-7110
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[122]  arXiv:2103.05102 (replaced) [pdf, other]
Title: Self-Supervised Multisensor Change Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[123]  arXiv:2103.11509 (replaced) [pdf, other]
Title: Scatter Correction in X-ray CT by Physics-Inspired Deep Learning
Subjects: Medical Physics (physics.med-ph); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[124]  arXiv:2103.12827 (replaced) [pdf, other]
Title: Fisher Task Distance and Its Applications in Neural Architecture Search and Transfer Learning
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[125]  arXiv:2103.16160 (replaced) [pdf, other]
Title: Data-Driven Predictive Control for Linear Parameter-Varying Systems
Comments: Accepted to 4th IFAC Workshop on Linear Parameter-Varying Systems
Subjects: Systems and Control (eess.SY)
[126]  arXiv:2103.16171 (replaced) [pdf, other]
Title: Fundamental Lemma for Data-Driven Analysis of Linear Parameter-Varying Systems
Comments: Accepted to the 60th Conference on Decision and Control 2021 (CDC2021)
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[127]  arXiv:2103.16629 (replaced) [pdf, other]
Title: Learning Lipschitz Feedback Policies from Expert Demonstrations: Closed-Loop Guarantees, Generalization and Robustness
Comments: Submitted to the IEEE Open Journal of Control Systems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
[128]  arXiv:2104.03186 (replaced) [pdf, other]
Title: Temporal Parallelisation of Dynamic Programming and Linear Quadratic Control
Comments: To appear in IEEE Transactions on Automatic Control
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
[129]  arXiv:2104.14631 (replaced) [pdf, other]
Title: Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
Comments: ICASSP 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[130]  arXiv:2105.00177 (replaced) [pdf, other]
Title: Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[131]  arXiv:2105.10239 (replaced) [pdf, other]
Title: AC-CovidNet: Attention Guided Contrastive CNN for Recognition of Covid-19 in Chest X-Ray Images
Comments: Accepted in Sixth IAPR International Conference on Computer Vision & Image Processing (CVIP2021)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[132]  arXiv:2106.01813 (replaced) [pdf, other]
Title: Identification of diffusively coupled linear networks through structured polynomial models
Authors: E.M.M. (Lizan) Kivits, Paul M.J. Van den Hof
Subjects: Systems and Control (eess.SY)
[133]  arXiv:2106.05229 (replaced) [pdf, other]
Title: Speech Recovery for Real-World Self-powered Intermittent Devices
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134]  arXiv:2106.07150 (replaced) [pdf, other]
Title: Selective Listening by Synchronizing Speech with Lips
Comments: Accepted by TASLP
Subjects: Audio and Speech Processing (eess.AS)
[135]  arXiv:2106.07978 (replaced) [pdf]
Title: Pixel-reassignment in Ultrasound Imaging
Journal-ref: Appl. Phys. Lett. 119, 123701 (2021)
Subjects: Medical Physics (physics.med-ph); Image and Video Processing (eess.IV)
[136]  arXiv:2106.08961 (replaced) [pdf, other]
Title: A Direct Slip Ratio Estimation Method based on an Intelligent Tire and Machine Learning
Comments: 12 pages, 25 figures, 2 tables
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[137]  arXiv:2106.09377 (replaced) [pdf, other]
Title: A New Dissipativity Condition for Asymptotic Stability of Discounted Economic MPC
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[138]  arXiv:2107.01368 (replaced) [pdf, ps, other]
Title: The coarsest lattice that determines a discrete multidimensional system
Comments: To appear in Mathematics of Control Signals and Systems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[139]  arXiv:2107.09574 (replaced) [pdf, other]
Title: Accelerating Edge Intelligence via Integrated Sensing and Communication
Comments: Accepted by IEEE ICC 2022. 7 Pages
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
[140]  arXiv:2107.10043 (replaced) [pdf, other]
Title: KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics
Comments: TSP Rev1
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[141]  arXiv:2108.02574 (replaced) [pdf, other]
Title: Optimal Transport for Unsupervised Denoising Learning
Comments: Preprint, under review (40 pages, 33 figures)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[142]  arXiv:2108.04585 (replaced) [pdf, other]
Title: Recurrent Neural Network-based Internal Model Control design for stable nonlinear systems
Comments: This work has been submitted to Elsevier European Journal of Control for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[143]  arXiv:2108.07893 (replaced) [pdf, other]
Title: Failure of the simultaneous block diagonalization technique applied to complete and cluster synchronization of random networks
Comments: Accepted for publication in PRE
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Systems and Control (eess.SY)
[144]  arXiv:2109.04405 (replaced) [pdf, other]
Title: An Accelerated Proximal Gradient-based Model Predictive Control Algorithm
Authors: Jia Wang, Ying Yang
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[145]  arXiv:2109.08555 (replaced) [pdf, other]
Title: Continuous Streaming Multi-Talker ASR with Dual-path Transducers
Comments: Accepted for publication at IEEE ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[146]  arXiv:2109.09738 (replaced) [pdf, other]
Title: An Optimal Control Framework for Joint-channel Parallel MRI Reconstruction without Coil Sensitivities
Comments: 13 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Optimization and Control (math.OC)
[147]  arXiv:2109.12239 (replaced) [pdf, other]
Title: Fast Successive-Cancellation List Flip Decoding of Polar Codes
Comments: Published in IEEE Access, Volume: 10, Page(s): 5568 - 5584, Date of Publication: 04 January 2022
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[148]  arXiv:2109.12591 (replaced) [pdf, other]
Title: Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement
Comments: Accecpted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2109.13419 (replaced) [pdf, other]
Title: The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Comments: 19 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[150]  arXiv:2110.00745 (replaced) [pdf, other]
Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression
Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[151]  arXiv:2110.01900 (replaced) [pdf, other]
Title: DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Comments: Accepted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[152]  arXiv:2110.03151 (replaced) [pdf, other]
Title: Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Comments: To appear in ICASSP 2022; System labels (SC and VBx) in Table 1 have been fixed
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[153]  arXiv:2110.03183 (replaced) [pdf, other]
Title: Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Authors: Prateek Verma
Comments: more grammar corrections; better figure captions; more content edits; SPL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[154]  arXiv:2110.03283 (replaced) [pdf, other]
Title: Experimental investigation on STFT phase representations for deep learning-based dysarthric speech detection
Comments: Accepted in ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS)
[155]  arXiv:2110.05777 (replaced) [pdf, other]
Title: Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[156]  arXiv:2110.05840 (replaced) [pdf, other]
Title: A bridge between features and evidence for binary attribute-driven perfect privacy
Comments: ICASSP 2022
Subjects: Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[157]  arXiv:2110.05904 (replaced) [pdf, other]
Title: Video Is Graph: Structured Graph Module for Video Action Recognition
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[158]  arXiv:2110.06467 (replaced) [pdf, other]
Title: Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[159]  arXiv:2110.09658 (replaced) [pdf, other]
Title: System Norm Regularization Methods for Koopman Operator Approximation
Comments: 26 pages, 14 figures, submitted to SIAM Journal on Applied Dynamical Systems (SIADS)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[160]  arXiv:2110.09992 (replaced) [pdf, other]
Title: ERQA: Edge-Restoration Quality Assessment for Video Super-Resolution
Comments: Accepted for presentation at the International Conference on Computer Vision Theory and Applications (VISAPP) 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[161]  arXiv:2110.13900 (replaced) [pdf, other]
Title: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162]  arXiv:2110.14578 (replaced) [pdf, other]
Title: Spatio-Temporal Federated Learning for Massive Wireless Edge Networks
Comments: 3 figures, conference
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[163]  arXiv:2111.01758 (replaced) [pdf]
Title: Universal Path Gain Laws for Common Wireless Communication Environments
Comments: accepted for publication in IEEE Transactions on Antennas and Propagation
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[164]  arXiv:2111.05790 (replaced) [pdf, other]
Title: Early Myocardial Infarction Detection over Multi-view Echocardiography
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
[165]  arXiv:2111.08334 (replaced) [pdf, other]
Title: Pansharpening by convolutional neural networks in the full resolution framework
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[166]  arXiv:2111.10480 (replaced) [pdf, other]
Title: TransMorph: Transformer for unsupervised medical image registration
Comments: 37 pages, 31 figures. Experiments on relative, learnable, and sinusoidal positional embeddings have been included
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[167]  arXiv:2112.08849 (replaced) [pdf, ps, other]
Title: Joint Waveform and Filter Designs for STAP-SLP-based MIMO-DFRC Systems
Comments: accepted by IEEE Journal on Selected Areas in Communications
Subjects: Signal Processing (eess.SP)
[168]  arXiv:2112.10379 (replaced) [pdf]
Title: Lane Departure Prediction Based on Closed-Loop Vehicle Dynamics
Comments: 12 pages, 23 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[169]  arXiv:2201.01669 (replaced) [pdf, other]
Title: Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[170]  arXiv:2201.03221 (replaced) [pdf, other]
Title: Optimization of Network Throughput of Joint Radar Communication System Using Stochastic Geometry
Subjects: Signal Processing (eess.SP)
[171]  arXiv:2201.05213 (replaced) [pdf, other]
Title: Parallel Neural Local Lossless Compression
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[172]  arXiv:2201.06819 (replaced) [pdf, other]
Title: Sensor Scheduling Design for Complex Networks under a Distributed State Estimation Framework
Subjects: Systems and Control (eess.SY)
[173]  arXiv:2201.08169 (replaced) [pdf, other]
Title: Secure Rate-Splitting for the MIMO Broadcast Channel with Imperfect CSIT and a Jammer
Comments: 5 pages, 3 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[ total of 173 entries: 1-173 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2201, contact, help  (Access key information)