We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 134 entries: 1-134 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 25 Apr 24

[1]  arXiv:2404.15278 [pdf, other]
Title: Security-Sensitive Task Offloading in Integrated Satellite-Terrestrial Networks
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO satellite edge}, enabling ground users (GU) to offload computing tasks to the resource-rich LEO satellite edge. However, existing LEO satellite computational offloading solutions primarily focus on optimizing system performance, neglecting the potential issue of malicious satellite attacks during task offloading. In this paper, we propose the deployment of LEO satellite edge in an integrated satellite-terrestrial networks (ISTN) structure to support \textit{security-sensitive computing task offloading}. We model the task allocation and offloading order problem as a joint optimization problem to minimize task offloading delay, energy consumption, and the number of attacks while satisfying reliability constraints. To achieve this objective, we model the task offloading process as a Markov decision process (MDP) and propose a security-sensitive task offloading strategy optimization algorithm based on proximal policy optimization (PPO). Experimental results demonstrate that our algorithm significantly outperforms other benchmark methods in terms of performance.

[2]  arXiv:2404.15279 [pdf, other]
Title: Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
Comments: Accepted by AAAI 2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics.

[3]  arXiv:2404.15284 [pdf, other]
Title: Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. The accurate prediction of STEC is essential for mitigating the ionospheric impact particularly on Global Navigation Satellite Systems (GNSS). In this work, we propose a high-precision STEC prediction model named DeepONet-STEC, which learns nonlinear operators to predict the 4D temporal-spatial integrated parameter for specified ground station - satellite ray path globally. As a demonstration, we validate the performance of the model based on GNSS observation data for global and US-CORS regimes under ionospheric quiet and storm conditions. The DeepONet-STEC model results show that the three-day 72 hour prediction in quiet periods could achieve high accuracy using observation data by the Precise Point Positioning (PPP) with temporal resolution 30s. Under active solar magnetic storm periods, the DeepONet-STEC also demonstrated its robustness and superiority than traditional deep learning methods. This work presents a neural operator regression architecture for predicting the 4D temporal-spatial ionospheric parameter for satellite navigation system performance, which may be further extended for various space applications and beyond.

[4]  arXiv:2404.15287 [pdf, other]
Title: A Semi-automatic Cranial Implant Design Tool Based on Rigid ICP Template Alignment and Voxel Space Reconstruction
Comments: 6 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In traumatic medical emergencies, the patients heavily depend on cranioplasty - the craft of neurocranial repair using cranial implants. Despite the improvements made in recent years, the design of a patient-specific implant (PSI) is among the most complex, expensive, and least automated tasks in cranioplasty. Further research in this area is needed. Therefore, we created a prototype application with a graphical user interface (UI) specifically tailored for semi-automatic implant generation, where the users only need to perform high-level actions. A general outline of the proposed implant generation process involves setting an area of interest, aligning the templates, and then creating the implant in voxel space. Furthermore, we show that the alignment can be improved significantly, by only considering clipped geometry in the vicinity of the defect border. The software prototype will be open-sourced at https://github.com/3Descape/Cranial_Implant_Design

[5]  arXiv:2404.15289 [pdf, other]
Title: EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbased approaches exhibit substantial potential for enhancing the signal-to-noise ratio of EEG data compared to traditional methods. In the realm of large-scale language models (LLMs), the Retentive Network (Retnet) infrastructure, prevalent for some models, demonstrates robust feature extraction and global modeling capabilities. Recognizing the temporal similarities between EEG signals and natural language, we introduce the Retnet from natural language processing to EEG denoising. This integration presents a novel approach to EEG denoising, opening avenues for a profound understanding of brain activities and accurate diagnosis of neurological diseases. Nonetheless, direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals, while natural language processing deals with two-dimensional data. To facilitate Retnet application to EEG denoising, we propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs. Experimental results validate the substantial improvement in denoising effectiveness achieved by the proposed method.

[6]  arXiv:2404.15290 [pdf, ps, other]
Title: A point cloud processing method of mmWave radar over automotive scenario
Subjects: Signal Processing (eess.SP)

This paper introduces in detail the effective method of comprehensive target judgment by using radar RA map and point cloud map. Different output of radar can effectively judge the road boundary of target and the relative coordinates of target, avoid the error of output caused by excessive processing information, and greatly improve the processing efficiency of DBSCAN of the measured target.

[7]  arXiv:2404.15292 [pdf, other]
Title: Multi-objective Optimization for Multi-UAV-assisted Mobile Edge Computing
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Recent developments in unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) have provided users with flexible and resilient computing services. However, meeting the computing-intensive and latency-sensitive demands of users poses a significant challenge due to the limited resources of UAVs. To address this challenge, we present a multi-objective optimization approach for multi-UAV-assisted MEC systems. First, we formulate a multi-objective optimization problem \textcolor{b2}{aiming} at minimizing the total task completion delay, reducing the total UAV energy consumption, and maximizing the total amount of offloaded tasks by jointly optimizing task offloading, computation resource allocation, and UAV trajectory control. Since the problem is a mixed-integer non-linear programming (MINLP) and NP-hard problem which is challenging, we propose a joint task offloading, computation resource allocation, and UAV trajectory control (JTORATC) approach to solve the problem. \textcolor{b3}{However, since the decision variables of task offloading, computation resource allocation, and UAV trajectory control are coupled with each other, the original problem is split into three sub-problems, i.e., task offloading, computation resource allocation, and UAV trajectory control, which are solved individually to obtain the corresponding decisions.} \textcolor{b2}{Moreover, the sub-problem of task offloading is solved by using distributed splitting and threshold rounding methods, the sub-problem of computation resource allocation is solved by adopting the Karush-Kuhn-Tucker (KKT) method, and the sub-problem of UAV trajectory control is solved by employing the successive convex approximation (SCA) method.} Simulation results show that the proposed JTORATC has superior performance compared to the other benchmark methods.

[8]  arXiv:2404.15293 [pdf, other]
Title: Interactive Manipulation and Visualization of 3D Brain MRI for Surgical Training
Subjects: Image and Video Processing (eess.IV); Graphics (cs.GR); Neurons and Cognition (q-bio.NC)

In modern medical diagnostics, magnetic resonance imaging (MRI) is an important technique that provides detailed insights into anatomical structures. In this paper, we present a comprehensive methodology focusing on streamlining the segmentation, reconstruction, and visualization process of 3D MRI data. Segmentation involves the extraction of anatomical regions with the help of state-of-the-art deep learning algorithms. Then, 3D reconstruction converts segmented data from the previous step into multiple 3D representations. Finally, the visualization stage provides efficient and interactive presentations of both 2D and 3D MRI data. Integrating these three steps, the proposed system is able to augment the interpretability of the anatomical information from MRI scans according to our interviews with doctors. Even though this system was originally designed and implemented as part of human brain haptic feedback simulation for surgeon training, it can also provide experienced medical practitioners with an effective tool for clinical data analysis, surgical planning and other purposes

[9]  arXiv:2404.15294 [pdf, ps, other]
Title: Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios
Comments: 5 pages, 6 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational complexity of machine learning methods and inadequate information capture. This paper proposes a multi-modal PFM framework based on an improved TimeMAE, which compresses time-series data into a low-dimensional latent space and integrates a self-enhanced attention module. This framework achieves effective monitoring of physical health, providing a solution for real-time and personalized assessment. The method is validated using the NHATS dataset, and the results demonstrate an accuracy of 70.6% and an AUC of 82.20%, surpassing other state-of-the-art time-series classification models.

[10]  arXiv:2404.15297 [pdf, ps, other]
Title: Multi-stream Transmission for Directional Modulation Network via distributed Multi-UAV-aided Multi-IRS
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its rate gain achieved by IRS. How to create multiple DoFs more than two for DM? In this paper, single large-scale IRS is divided to multiple small IRSs and a novel multi-IRS-aided multi-stream DM network is proposed to achieve a point-to-point multi-stream transmission by creating $K$ ($\geq3$) DoFs, where multiple small IRSs are placed distributively via multiple unmanned aerial vehicles (UAVs). The null-space projection, zero-forcing (ZF) and phase alignment are adopted to design the transmit beamforming vector, receive beamforming vector and phase shift matrix (PSM), respectively, called NSP-ZF-PA. Here, $K$ PSMs and their corresponding beamforming vectors are independently optimized. The weighted minimum mean-square error (WMMSE) algorithm is involved in alternating iteration for the optimization variables by introducing the power constraint on IRS, named WMMSE-PC, where the majorization-minimization (MM) algorithm is used to solve the total PSM. To achieve a lower computational complexity, a maximum trace method, called Max-TR-SVD, is proposed by optimize the PSM of all IRSs. Numerical simulation results has shown that the proposed NSP-ZF-PA performs much better than Max-TR-SVD in terms of rate. In particular, the rate of NSP-ZF-PA with sixteen small IRSs is about five times that of NSP-ZF-PA with combining all small IRSs as a single large IRS. Thus, a dramatic rate enhancement may be achieved by multiple distributed IRSs.

[11]  arXiv:2404.15302 [pdf, other]
Title: Robust Phase Retrieval by Alternating Minimization
Subjects: Signal Processing (eess.SP); Optimization and Control (math.OC); Statistics Theory (math.ST)

We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of Robust-AM, we adopt two computationally efficient methods for linear programs. We provide a non-asymptotic convergence analysis of these practical algorithms for Robust-AM under the standard Gaussian measurement assumption. These algorithms, when suitably initialized, are guaranteed to converge linearly to the ground truth at an order-optimal sample complexity with high probability while the support of sparse noise is arbitrarily fixed and the sparsity level is no larger than $1/4$. Additionally, through comprehensive numerical experiments on synthetic and image datasets, we show that Robust-AM outperforms existing methods for robust phase retrieval offering comparable theoretical performance

[12]  arXiv:2404.15305 [pdf, other]
Title: ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine-tuned in heterogeneous domains. To address the issue, we propose ADAPT^2, a few-shot domain adaptation framework for personalizing self-supervised models. ADAPT2 proposes self-supervised meta-learning for initial model pre-training, followed by a user-side model adaptation by replaying the self-supervision with user-specific data. This allows models to adjust their pre-trained representations to the user with only a few samples. Evaluation with four benchmarks demonstrates that ADAPT^2 outperforms existing baselines by an average F1-score of 8.8%p. Our on-device computational overhead analysis on a commodity off-the-shelf (COTS) smartphone shows that ADAPT2 completes adaptation within an unobtrusive latency (in three minutes) with only a 9.54% memory consumption, demonstrating the computational efficiency of the proposed method.

[13]  arXiv:2404.15307 [pdf, other]
Title: DCAE-SR: Design of a Denoising Convolutional Autoencoder for reconstructing Electrocardiograms signals at Super Resolution
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

Electrocardiogram (ECG) signals play a pivotal role in cardiovascular diagnostics, providing essential information on the electrical activity of the heart. However, the inherent noise and limited resolution in ECG recordings can hinder accurate interpretation and diagnosis. In this paper, we propose a novel model for ECG super resolution (SR) that uses a DNAE to enhance temporal and frequency information inside ECG signals. Our approach addresses the limitations of traditional ECG signal processing techniques. Our model takes in input 5-second length ECG windows sampled at 50 Hz (very low resolution) and it is able to reconstruct a denoised super-resolution signal with an x10 upsampling rate (sampled at 500 Hz). We trained the proposed DCAE-SR on public available myocardial infraction ECG signals. Our method demonstrates superior performance in reconstructing high-resolution ECG signals from very low-resolution signals with a sampling rate of 50 Hz. We compared our results with the current deep-learning literature approaches for ECG super-resolution and some non-deep learning reproducible methods that can perform both super-resolution and denoising. We obtained current state-of-the-art performances in super-resolution of very low resolution ECG signals frequently corrupted by ECG artifacts. We were able to obtain a signal-to-noise ratio of 12.20 dB (outperforms previous 4.68 dB), mean squared error of 0.0044 (outperforms previous 0.0154) and root mean squared error of 4.86% (outperforms previous 12.40%). In conclusion, our DCAE-SR model offers a robust (to artefact presence), versatile and explainable solution to enhance the quality of ECG signals. This advancement holds promise in advancing the field of cardiovascular diagnostics, paving the way for improved patient care and high-quality clinical decisions

[14]  arXiv:2404.15308 [pdf, ps, other]
Title: Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction
Comments: 4 pages, 1 figure. This was work was presented at the IEEE International Conference on AI for Medicine, Health, and Care 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Sleep staging is a clinically important task for diagnosing various sleep disorders, but remains challenging to deploy at scale because it because it is both labor-intensive and time-consuming. Supervised deep learning-based approaches can automate sleep staging but at the expense of large labeled datasets, which can be unfeasible to procure for various settings, e.g., uncommon sleep disorders. While self-supervised learning (SSL) can mitigate this need, recent studies on SSL for sleep staging have shown performance gains saturate after training with labeled data from only tens of subjects, hence are unable to match peak performance attained with larger datasets. We hypothesize that the rapid saturation stems from applying a sub-optimal pretraining scheme that pretrains only a portion of the architecture, i.e., the feature encoder, but not the temporal encoder; therefore, we propose adopting an architecture that seamlessly couples the feature and temporal encoding and a suitable pretraining scheme that pretrains the entire model. On a sample sleep staging dataset, we find that the proposed scheme offers performance gains that do not saturate with amount of labeled training data (e.g., 3-5\% improvement in balanced sleep staging accuracy across low- to high-labeled data settings), reducing the amount of labeled training data needed for high performance (e.g., by 800 subjects). Based on our findings, we recommend adopting this SSL paradigm for subsequent work on SSL for sleep staging.

[15]  arXiv:2404.15309 [pdf, other]
Title: Sparse Bayesian Correntropy Learning for Robust Muscle Activity Reconstruction from Noisy Brain Recordings
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly noisy and contains many non-Gaussian noises, which could lead to significant performance degradation for sparse Bayesian learning method. The goal of this paper is to propose a new robust implementation for sparse Bayesian learning, so that robustness and sparseness can be realized simultaneously. Motivated by the great robustness of maximum correntropy criterion (MCC), we proposed an integration of MCC into the sparse Bayesian learning regime. To be specific, we derived the explicit error assumption inherent in the MCC and then leveraged it for the likelihood function. Meanwhile, we used the automatic relevance determination (ARD) technique for the sparse prior distribution. To fully evaluate the proposed method, a synthetic dataset and a real-world muscle activity reconstruction task with two different brain modalities were employed. Experimental results showed that our proposed sparse Bayesian correntropy learning framework improves significantly the robustness in a noisy regression task. The proposed method can realize higher correlation coefficient and lower root mean squared error in the real-world muscle activity reconstruction tasks. Sparse Bayesian correntropy learning provides a powerful tool for neural decoding which can promote the development of brain-computer interfaces.

[16]  arXiv:2404.15311 [pdf, other]
Title: Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression
Comments: Accepted HCI International 2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression. The core of this approach lies in harnessing the sequential data processing strengths of ViTs along with the superior feature extraction capabilities of TCNet, to significantly improve EEG analysis accuracy. In addition, we analyze the importance of how to construct optimal patches for the attention mechanism to analyze, balancing both speed and accuracy tradeoffs. Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8 on EEGEyeNet's Absolute Position Task, outperforming existing state-of-the-art models. Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster). This breakthrough not only sets a new benchmark in EEG regression analysis but also opens new avenues for future research in the integration of transformer architectures with specialized feature extraction methods for diverse EEG datasets.

[17]  arXiv:2404.15312 [pdf, other]
Title: Realtime Person Identification via Gait Analysis
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV)

Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to develop highly efficient low-power models that can be deployed on to small form-factor devices such as microcontrollers. In this paper, we propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. This model was trained on a public gait dataset with 20 classes augmented with data collected by the authors, aggregating to 24 classes in total. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power, while running continuous inference on Arduino Nano 33 BLE Sense. We successfully demonstrated realtime identification of the authors with the model running on Arduino, thus underscoring the efficacy and providing a proof of feasiblity for deployment in practical systems in near future.

[18]  arXiv:2404.15314 [pdf, other]
Title: Detection of direct path component absence in NLOS UWB channel
Comments: The dataset used in the study is available at Zenodo: Marcin Kolakowski. (2021). UWB Channel Impulse Responses Registered in a Furnished Apartment (Version 1.0) [Data set]. Zenodo. \url{this http URL} Originally presented at 2018 22nd International Microwave and Radar Conference (MIKON), Poznan, Poland, 2018
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper a novel NLOS (Non-Line-of-Sight) identification technique is proposed. In comparison to other methods described in the literature, it discerns a situation when the delayed direct path component is available from when it's totally blocked and introduced biases are much higher and harder to mitigate. In the method, NLOS identification is performed using Support Vector Machine (SVM) algorithm based on various signal features. The paper includes description of the method and the results of performed experiment.

[19]  arXiv:2404.15319 [pdf, other]
Title: The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark
Comments: 43 pages, 13 figures, 5 tables
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Objective. This study conduct an extensive Brain-computer interfaces (BCI) reproducibility analysis on open electroencephalography datasets, aiming to assess existing solutions and establish open and reproducible benchmarks for effective comparison within the field. The need for such benchmark lies in the rapid industrial progress that has given rise to undisclosed proprietary solutions. Furthermore, the scientific literature is dense, often featuring challenging-to-reproduce evaluations, making comparisons between existing approaches arduous.
Approach. Within an open framework, 30 machine learning pipelines (separated into raw signal: 11, Riemannian: 13, deep learning: 6) are meticulously re-implemented and evaluated across 36 publicly available datasets, including motor imagery (14), P300 (15), and SSVEP (7). The analysis incorporates statistical meta-analysis techniques for results assessment, encompassing execution time and environmental impact considerations.
Main results. The study yields principled and robust results applicable to various BCI paradigms, emphasizing motor imagery, P300, and SSVEP. Notably, Riemannian approaches utilizing spatial covariance matrices exhibit superior performance, underscoring the necessity for significant data volumes to achieve competitive outcomes with deep learning techniques. The comprehensive results are openly accessible, paving the way for future research to further enhance reproducibility in the BCI domain.
Significance. The significance of this study lies in its contribution to establishing a rigorous and transparent benchmark for BCI research, offering insights into optimal methodologies and highlighting the importance of reproducibility in driving advancements within the field.

[20]  arXiv:2404.15321 [pdf, other]
Title: Characteristics-Based Design of Multi-Exponent Bandpass Filters
Comments: 14 pages, 5 figures, 2 tables, 62 equations. Submitted to IEEE Transactions on Circuits and Systems I: Regular Papers
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

We develop methods to design bandpass filters given desired characteristics such as peak frequency, bandwidth, and group delay. We develop this filter design method for filters we refer to as Generalized Auditory Filters (GAFs) which are represented as second order filters raised to non-unitary exponents and hence have three degrees of freedom. Our method for filter design accommodates specification of a trio of frequency-domain characteristics from amongst the peak frequency, convexity, 3dB, ndB quality factor, equivalent rectangular bandwidth, maximum group delay, and phase accumulation. To develop our characteristics-based design methods, we derive expressions for the filter constants directly in terms of filter characteristics. The parameterization of GAFs in terms of sets of characteristics allows for specifying magnitude-based characteristics (e.g. bandwidths) and phase-based characteristics (e.g. group delays) simultaneously. This enables designing sharply tuned filters without significant group delay, and is particularly important in filterbanks where frequency selectivity and synchronization are both important aspects of design. Using our methods, we directly dictate values for desired filter characteristics - unlike iterative filter design methods. This allows for more direct design of GAFs for phase-picking from seismic signals, cochlear implants, and rainbow sensors. The methods also directly apply to related bandpass and multi-band filters.

[21]  arXiv:2404.15323 [pdf, other]
Title: Transportation mode recognition based on low-rate acceleration and location signals with an attention-based multiple-instance learning network
Comments: 13 pages, 5 figures, 9 tables, accepted in IEEE Transactions on Intelligent Transportation Systems
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Transportation mode recognition (TMR) is a critical component of human activity recognition (HAR) that focuses on understanding and identifying how people move within transportation systems. It is commonly based on leveraging inertial, location, or both types of signals, captured by modern smartphone devices. Each type has benefits (such as increased effectiveness) and drawbacks (such as increased battery consumption) depending on the transportation mode (TM). Combining the two types is challenging as they exhibit significant differences such as very different sampling rates. This paper focuses on the TMR task and proposes an approach for combining the two types of signals in an effective and robust classifier. Our network includes two sub-networks for processing acceleration and location signals separately, using different window sizes for each signal. The two sub-networks are designed to also embed the two types of signals into the same space so that we can then apply an attention-based multiple-instance learning classifier to recognize TM. We use very low sampling rates for both signal types to reduce battery consumption. We evaluate the proposed methodology on a publicly available dataset and compare against other well known algorithms.

[22]  arXiv:2404.15324 [pdf, other]
Title: Advanced simulation-based predictive modelling for solar irradiance sensor farms
Journal-ref: Journal of Simulation, pp. 1-18, 2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

As solar power continues to grow and replace traditional energy sources, the need for reliable forecasting models becomes increasingly important to ensure the stability and efficiency of the grid. However, the management of these models still needs to be improved, and new tools and technologies are required to handle the deployment and control of solar facilities. This work introduces a novel framework named Cloud-based Analysis and Integration for Data Efficiency (CAIDE), designed for real-time monitoring, management, and forecasting of solar irradiance sensor farms. CAIDE is designed to manage multiple sensor farms simultaneously while improving predictive models in real-time using well-grounded Modeling and Simulation (M&S) methodologies. The framework leverages Model Based Systems Engineering (MBSE) and an Internet of Things (IoT) infrastructure to support the deployment and analysis of solar plants in dynamic environments. The system can adapt and re-train the model when given incorrect results, ensuring that forecasts remain accurate and up-to-date. Furthermore, CAIDE can be executed in sequential, parallel, and distributed architectures, assuring scalability. The effectiveness of CAIDE is demonstrated in a complex scenario composed of several solar irradiance sensor farms connected to a centralized management system. Our results show that CAIDE is scalable and effective in managing and forecasting solar power production while improving the accuracy of predictive models in real time. The framework has important implications for the deployment of solar plants and the future of renewable energy sources.

[23]  arXiv:2404.15326 [pdf, other]
Title: 5G-Advanced AI/ML Beam Management: Performance Evaluation with Integrated ML Models
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP)

The legacy beam management (BM) procedure in 5G introduces higher measurement and reporting overheads for larger beam codebooks resulting in higher power consumption of user equipment (UEs). Hence, the 3rd generation partnership project (3GPP) studied the use of artificial intelligence (AI) and machine learning (ML) in the air interface to reduce the overhead associated with the legacy BM procedure. The usage of AI/ML in BM is mainly discussed with regard to spatial-domain beam prediction (SBP) and time-domain beam prediction (TBP). In this study, we discuss different sub-use cases of SBP and TBP and evaluate the beam prediction accuracy of AI/ML models designed for each sub-use case along with AI/ML model generalization aspects. Moreover, a comprehensive system-level performance evaluation is presented in terms of user throughput with integrated AI/ML models to a 3GPP-compliant system-level simulator. Based on user throughput evaluations, we present AI/ML BM design guidelines for the deployment of lightweight, low-complexity AI/ML models discussed in this study.

[24]  arXiv:2404.15327 [pdf, other]
Title: A Low-Complexity Design for IRS-Assisted Secure Dual-Function Radar-Communication System
Comments: Submitted to IEEE Journal, under review. arXiv admin note: text overlap with arXiv:2310.00555
Subjects: Signal Processing (eess.SP)

In dual-function radar-communication (DFRC) systems the probing signal contains information intended for the communication users, which makes that information vulnerable to eavesdropping by the targets. We study the security of a DFRC system aided by an intelligent reflecting surface (IRS) from the physical layer security (PLS) perspective. The IRS helps overcome path loss or blockage and introduces more degrees of freedom for system design, however, it also makes the design problem more challenging. In the system considered, the radar embeds artificial noise (AN) in the probing waveform, and the radar waveform, the AN noise and the IRS parameters are designed to optimize the communication secrecy rate while meeting radar signal-to-noise ratio (SNR) constraints. The contribution of the paper is a novel, low complexity approach to solve the underlying optimization problem and obtain the design parameters. In particular, we consider an alternating optimization approach, where in each iteration, the problem is decomposed into two sub-problems, namely, one that designs the IRS parameters, and another that jointly designs the radar waveform and the AN. The challenges in those sub-problems are the fractional objective, the SNR being a quartic function of the IRS parameters, and the unit-modulus constraint on the IRS parameters. A fractional programming technique is used to transform the fractional form objective into a more tractable non-fractional polynomial form. A closed-form based approach is proposed for the IRS design problem, which results in low complexity IRS design. Numerical results are provided to demonstrate the convergence properties of the proposed system design method, the secrecy rate and beamforming performance of the designed system.

[25]  arXiv:2404.15328 [pdf, other]
Title: Time topological analysis of EEG using signature theory
Comments: 14 pages, 5 figures Under review for Journ\'ee des Statistiques 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Anomaly detection in multivariate signals is a task of paramount importance in many disciplines (epidemiology, finance, cognitive sciences and neurosciences, oncology, etc.). In this perspective, Topological Data Analysis (TDA) offers a battery of "shape" invariants that can be exploited for the implementation of an effective detection scheme. Our contribution consists of extending the constructions presented in \cite{chretienleveraging} on the construction of simplicial complexes from the Signatures of signals and their predictive capacities, rather than the use of a generic distance as in \cite{petri2014homological}. Signature theory is a new theme in Machine Learning arXiv:1603.03788 stemming from recent work on the notions of Rough Paths developed by Terry Lyons and his team \cite{lyons2002system} based on the formalism introduced by Chen \cite{chen1957integration}. We explore in particular the detection of changes in topology, based on tracking the evolution of homological persistence and the Betti numbers associated with the complex introduced in \cite{chretienleveraging}. We apply our tools for the analysis of brain signals such as EEG to detect precursor phenomena to epileptic seizures.

[26]  arXiv:2404.15329 [pdf, ps, other]
Title: Greedy Capon Beamformer
Authors: Esa Ollila
Comments: Submitted for publication
Subjects: Signal Processing (eess.SP); Methodology (stat.ME)

We propose greedy Capon beamformer (GBF) for direction finding of narrow-band sources present in the array's viewing field. After defining the grid covering the location search space, the algorithm greedily builds the interference-plus-noise covariance matrix by identifying a high-power source on the grid using Capon's principle of maximizing the signal to interference plus noise ratio (SINR) while enforcing unit gain towards the signal of interest. An estimate of the power of the detected source is derived by exploiting the unit power constraint, which subsequently allows to update the noise covariance matrix by simple rank-1 matrix addition composed of outerproduct of the selected steering matrix with itself scaled by the signal power estimate. Our numerical examples demonstrate effectiveness of the proposed GCB in direction finding where it perform favourably compared to the state-of-the-art algorithms under a broad variety of settings. Furthermore, GCB estimates of direction-of-arrivals (DOAs) are very fast to compute.

[27]  arXiv:2404.15330 [pdf, other]
Title: Anchor Pair Selection in TDOA Positioning Systems by Door Transition Error Minimization
Comments: Originally presented at 2022 24th International Microwave and Radar Conference (MIKON), Gdansk, Poland, 2022
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

This paper presents an adaptive anchor pairs selection algorithm for UWB (ultra-wideband) TDOA-based (Time Difference of Arrival) indoor positioning systems. The method assumes dividing the system operation area into zones. The most favorable anchor pairs are selected by minimizing the positioning errors in doorways leading to these zones where possible users' locations are limited to small, narrow areas. The sets are determined separately for going in and out of the zone to take users' body shadowing into account. The determined anchor pairs are then used to calculate TDOA values and localize the user moving around the apartment with an Extended Kalman Filter based algorithm. The method was tested experimentally in a furnished apartment. The results have shown that the adaptive selection of the anchor pairs leads to an increase in the user's localization accuracy. The median trajectory error was about 0.32 m.

[28]  arXiv:2404.15331 [pdf, other]
Title: Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC)

Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints.
To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures.
Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain.
The code and the pre-trained SSL models are publicly available for further research and development.

[29]  arXiv:2404.15332 [pdf, other]
Title: Clinical translation of machine learning algorithms for seizure detection in scalp electroencephalography: a systematic review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Machine learning algorithms for seizure detection have shown great diagnostic potential, with recent reported accuracies reaching 100%. However, few published algorithms have fully addressed the requirements for successful clinical translation. For example, the properties of training data may critically limit the generalisability of algorithms, algorithms may be sensitive to variability across EEG acquisition hardware, and run-time processing costs may render them unfeasible for real-time clinical use cases. Here, we systematically review machine learning seizure detection algorithms with a focus on clinical translatability, assessed by criteria including generalisability, run-time costs, explainability, and clinically-relevant performance metrics. For non-specialists, we provide domain-specific knowledge necessary to contextualise model development and evaluation. Our critical evaluation of machine learning algorithms with respect to their potential real-world effectiveness can help accelerate clinical translation and identify gaps in the current seizure detection literature.

[30]  arXiv:2404.15333 [pdf, other]
Title: EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are not applicable in cases with few training examples. This is because abnormal ECG data is scarce compared to normal data in most real-world clinical settings. Therefore, in this study, GAN-based anomaly detec-tion, i.e., unsupervised learning, was employed to address the issue of data imbalance. This paper focuses on detecting abnormal signals in electrocardi-ograms (ECGs) using only labels from normal signals as training data. In-spired by self-supervised vision transformers, which learn by dividing images into patches, and masked auto-encoders, known for their effectiveness in patch reconstruction and solving information redundancy, we introduce the ECG Heartbeat Anomaly Detection model, EB-GAME. EB-GAME was trained and validated on the MIT-BIH Arrhythmia Dataset, where it achieved state-of-the-art performance on this benchmark.

[31]  arXiv:2404.15334 [pdf, ps, other]
Title: Performance Enhancement via Real-time Image-based Beam Tracking for WA-OWC with Dynamic Waves and Mobile Receivers
Subjects: Signal Processing (eess.SP)

Intensified underwater activities have driven the escalating demand for reliable, flexible, and high data-rate underwater communication links. Optical wireless communication (OWC) emerges as the most promising technology for short- to medium-range communication, facilitating the real-time high-speed transmission of information from undersea to an aerial vehicle which can subsequently relay the information to a terrestrial station. However, establishing a robust water-air link confronts two primary challenges: (i) beam wandering due to the time-varying refraction when the light beam passes through the undulating ocean surface and (ii) the drone's movement when it hovers above the ocean surface. This paper experimentally demonstrated a real-time imaged-based beam tracking system to mitigate beam misalignment due to dynamic waves and receiver movement over a 0.14-m underwater and 1.83-m free-space OWC channel. Experimental results evince a notable reduction in the standard deviation of the received light spot offset from the receiver. Moreover, the tracking system can proficiently accommodate receiver velocities of up to 150 cm/s while maintaining a paltry packet loss rate (PLR) below 10%. By addressing the combined effects of dynamic waves and moving receivers, the proposed beam tracking system successfully enables a 70% reduction in PLR and an order of magnitude decrease in bit error rate (BER). This results in a substantial 17-fold surge in maximum throughput, from 50 Mbps to 850 Mbps. The experimental results validate the feasibility and effectiveness of the beam tracking system for vanquishing the detrimental effects in the complex water-air OWC (WA-OWC) channel and supporting high-speed data transmission.

[32]  arXiv:2404.15335 [pdf, ps, other]
Title: Integrative Deep Learning Framework for Parkinson's Disease Early Detection using Gait Cycle Data Measured by Wearable Sensors: A CNN-GRU-GNN Approach
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Efficient early diagnosis is paramount in addressing the complexities of Parkinson's disease because timely intervention can substantially mitigate symptom progression and improve patient outcomes. In this paper, we present a pioneering deep learning architecture tailored for the binary classification of subjects, utilizing gait cycle datasets to facilitate early detection of Parkinson's disease. Our model harnesses the power of 1D-Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), and Graph Neural Network (GNN) layers, synergistically capturing temporal dynamics and spatial relationships within the data. In this work, 16 wearable sensors located at the end of subjects' shoes for measuring the vertical Ground Reaction Force (vGRF) are considered as the vertices of a graph, their adjacencies are modelled as edges of this graph, and finally, the measured data of each sensor is considered as the feature vector of its corresponding vertex. Therefore, The GNN layers can extract the relations among these sensors by learning proper representations. Regarding the dynamic nature of these measurements, GRU and CNN are used to analyze them spatially and temporally and map them to an embedding space. Remarkably, our proposed model achieves exceptional performance metrics, boasting accuracy, precision, recall, and F1 score values of 99.51%, 99.57%, 99.71%, and 99.64%, respectively.

[33]  arXiv:2404.15336 [pdf, other]
Title: Machine Learning Techniques for Source Localisation in Elastic Media
Comments: MSc dissertation, Queen Mary University of London (2022 Calendar Year)
Subjects: Signal Processing (eess.SP)

Coronary Artery Disease (CAD) results from plaque deposit in a coronary artery. Early diagnosis is imperative, so a non-invasive detection method is being developed to identify acoustic signals caused by partial occlusions in the artery. The blood flow in the artery is disturbed and imposes oscillatory stresses on the artery wall. The deformations caused by the stresses can be detected at the chest surface. Therefore, by using data simulating these surface signals, which arise from randomly assigned source positions, machine learning (ML) can be utilised to predict the source of the occlusion. Seven ML algorithms were investigated, and the results from this study found that an ensemble model combining k-Nearest Neighbours and Random Forest had the best performance. The metrics used to evaluate this was the mean squared error and Euclidean distance.

[34]  arXiv:2404.15337 [pdf, other]
Title: RSSI Estimation for Constrained Indoor Wireless Networks using ANN
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-based ANN model. Both models have been constructed to enhance LP-IoT communication by lowering the estimation error in the LP-IoT wireless channel. The Feature-based model aims to capture complex patterns of measured Received Signal Strength Indicator (RSSI) data using environmental characteristics. The Sequence-based approach utilises predetermined categorisation techniques to estimate the RSSI sequence of specifically selected environment characteristics. The findings demonstrate that our suggested approaches attain remarkable precision in channel estimation, with an improvement in MSE of $88.29\%$ of the Feature-based model and $97.46\%$ of the Sequence-based model over existing research. Additionally, the comparative analysis of these techniques with traditional and other Deep Learning (DL)-based techniques also highlights the superior performance of our developed models and their potential in real-world IoT applications.

[35]  arXiv:2404.15339 [pdf, other]
Title: Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation
Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery
Subjects: Image and Video Processing (eess.IV)

The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate shapes and textures. To address this gap, we present a data-driven framework that leverages emerging neural radiance field technology to enable high-quality surgical reconstruction and explore its application for surgical simulations. We first focus on developing a fast NeRF-based surgical scene 3D reconstruction approach that achieves state-of-the-art performance. This method can significantly outperform traditional 3D reconstruction methods, which have failed to capture large deformations and produce fine-grained shapes and textures. We then propose an automated creation pipeline of interactive surgical simulation environments through a closed mesh extraction algorithm. Our experiments have validated the superior performance and efficiency of our proposed approach in surgical scene 3D reconstruction. We further utilize our reconstructed soft tissues to conduct FEM and MPM simulations, showcasing the practical application of our method in data-driven surgical simulations.

[36]  arXiv:2404.15340 [pdf, other]
Title: RayPet: Unveiling Challenges and Solutions for Activity and Posture Recognition in Pets Using FMCW Mm-Wave Radar
Subjects: Signal Processing (eess.SP)

Recognizing animal activities holds a crucial role in monitoring animals' health and well-being. Additionally, a considerable audience is keen on monitoring their pets' well-being and health status. Insight into animals' habitual activities and patterns not only aids veterinarians in accurate diagnoses but also offers pet owners early alerts. Traditional methods of tracking animal behavior involve wearable sensors like IMU sensors, collars, or cameras. Nevertheless, concerns, including privacy, robustness, and animal discomfort persist. In this study, radar technology, a noninvasive remote sensing technology widely employed in human health monitoring, is explored for AAR. Radar enables fine motion analysis through Microdoppler spectrograms. Utilizing an off-the-shelf FMCW mm-wave radar, we gather data from five distinct activities and postures. Merging radar technology with Machine Learning and Deep Learning algorithms helps distinguish diverse pet activities and postures. Specific challenges in AAR, such as random movements, being uncontrollable, noise, and small animal size, make radar adoption for animal monitoring complex. In this study, RayPet unveils different challenges and solutions regarding monitoring small animals. To overcome the challenges, different signal processing steps are devised and implemented, tailored for animals. We use four types of classifiers and achieve an accuracy rate of 89%. This progress marks an important step in using radar technology to observe and comprehend activities and postures in pets in particular and in animals in general, contributing to our knowledge of animal well-being and behavior analysis.

[37]  arXiv:2404.15341 [pdf, other]
Title: Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional BD method is solely designed for feature extraction with its own optimizer and objective function. When BD is combined with downstream deep learning classifiers, the different learning objectives will be in conflict. To address this problem, this paper introduces classifier-guided BD (ClassBD) for joint learning of BD-based feature extraction and deep learning-based fault classification. Firstly, we present a time and frequency neural BD that employs neural networks to implement conventional BD, thereby facilitating the seamless integration of BD and the deep learning classifier for co-optimization of model parameters. Subsequently, we develop a unified framework to use a deep learning classifier to guide the learning of BD filters. In addition, we devise a physics-informed loss function composed of kurtosis, $l_2/l_4$ norm, and a cross-entropy loss to jointly optimize the BD filters and deep learning classifier. Consequently, the fault labels provide useful information to direct BD to extract features that distinguish classes amidst strong noise. To the best of our knowledge, this is the first of its kind that BD is successfully applied to bearing fault diagnosis. Experimental results from three datasets demonstrate that ClassBD outperforms other state-of-the-art methods under noisy conditions.

[38]  arXiv:2404.15342 [pdf, other]
Title: WaveSleepNet: An Interpretable Network for Expert-like Sleep Staging
Authors: Yan Pei, Wei Luo
Comments: 17 pages, 6 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Although deep learning algorithms have proven their efficiency in automatic sleep staging, the widespread skepticism about their "black-box" nature has limited its clinical acceptance. In this study, we propose WaveSleepNet, an interpretable neural network for sleep staging that reasons in a similar way to sleep experts. In this network, we utilize the latent space representations generated during training to identify characteristic wave prototypes corresponding to different sleep stages. The feature representation of an input signal is segmented into patches within the latent space, each of which is compared against the learned wave prototypes. The proximity between these patches and the wave prototypes is quantified through scores, indicating the prototypes' presence and relative proportion within the signal. The scores are served as the decision-making criteria for final sleep staging. During training, an ensemble of loss functions is employed for the prototypes' diversity and robustness. Furthermore, the learned wave prototypes are visualized by analysing occlusion sensitivity. The efficacy of WaveSleepNet is validated across three public datasets, achieving sleep staging performance that are on par with the state-of-the-art models when several WaveSleepNets are combine into a larger network. A detailed case study examined the decision-making process of the WaveSleepNet which aligns closely with American Academy of Sleep Medicine (AASM) manual guidelines. Another case study systematically explained the misidentified reason behind each sleep stage. WaveSleepNet's transparent process provides specialists with direct access to the physiological significance of its criteria, allowing for future adaptation or enrichment by sleep experts.

[39]  arXiv:2404.15343 [pdf, other]
Title: Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis
Comments: Accepted at the IEEE Wireless Communications and Networking Conference (WCNC) 2024
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

The recent advancement in deep learning (DL) for automatic modulation classification (AMC) of wireless signals has encouraged numerous possible applications on resource-constrained edge devices. However, developing optimized DL models suitable for edge applications of wireless communications is yet to be studied in depth. In this work, we perform a thorough investigation of optimized convolutional neural networks (CNNs) developed for AMC using the three most commonly used model optimization techniques: a) pruning, b) quantization, and c) knowledge distillation. Furthermore, we have proposed optimized models with the combinations of these techniques to fuse the complementary optimization benefits. The performances of all the proposed methods are evaluated in terms of sparsity, storage compression for network parameters, and the effect on classification accuracy with a reduction in parameters. The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity while maintaining or even improving classification performance compared to the benchmark CNNs.

[40]  arXiv:2404.15344 [pdf, other]
Title: Adversarial Robustness of Distilled and Pruned Deep Learning-based Wireless Classifiers
Comments: Accepted at the IEEE Wireless Communications and Networking Conference (WCNC) 2024
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Data-driven deep learning (DL) techniques developed for automatic modulation classification (AMC) of wireless signals are vulnerable to adversarial attacks. This poses a severe security threat to the DL-based wireless systems, specifically for edge applications of AMC. In this work, we address the joint problem of developing optimized DL models that are also robust against adversarial attacks. This enables efficient and reliable deployment of DL-based AMC on edge devices. We first propose two optimized models using knowledge distillation and network pruning, followed by a computationally efficient adversarial training process to improve the robustness. Experimental results on five white-box attacks show that the proposed optimized and adversarially trained models can achieve better robustness than the standard (unoptimized) model. The two optimized models also achieve higher accuracy on clean (unattacked) samples, which is essential for the reliability of DL-based solutions at edge applications.

[41]  arXiv:2404.15346 [pdf, other]
Title: A Novel Micro-Doppler Coherence Loss for Deep Learning Radar Applications
Comments: Presented at 2021 18th European Radar Conference (EuRAD)
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning techniques are subject to increasing adoption for a wide range of micro-Doppler applications, where predictions need to be made based on time-frequency signal representations. Most, if not all, of the reported applications focus on translating an existing deep learning framework to this new domain with no adjustment made to the objective function. This practice results in a missed opportunity to encourage the model to prioritize features that are particularly relevant for micro-Doppler applications. Thus the paper introduces a micro-Doppler coherence loss, minimized when the normalized power of micro-Doppler oscillatory components between input and output is matched. The experiments conducted on real data show that the application of the introduced loss results in models more resilient to noise.

[42]  arXiv:2404.15347 [pdf, ps, other]
Title: Advanced Neural Network Architecture for Enhanced Multi-Lead ECG Arrhythmia Detection through Optimized Feature Extraction
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cardiovascular diseases are a pervasive global health concern, contributing significantly to morbidity and mortality rates worldwide. Among these conditions, arrhythmia, characterized by irregular heart rhythms, presents formidable diagnostic challenges. This study introduces an innovative approach utilizing deep learning techniques, specifically Convolutional Neural Networks (CNNs), to address the complexities of arrhythmia classification. Leveraging multi-lead Electrocardiogram (ECG) data, our CNN model, comprising six layers with a residual block, demonstrates promising outcomes in identifying five distinct heartbeat types: Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Atrial Premature Contraction (APC), Premature Ventricular Contraction (PVC), and Normal Beat. Through rigorous experimentation, we highlight the transformative potential of our methodology in enhancing diagnostic accuracy for cardiovascular arrhythmias. Arrhythmia diagnosis remains a critical challenge in cardiovascular care, often relying on manual interpretation of ECG signals, which can be time-consuming and prone to subjectivity. To address these limitations, we propose a novel approach that leverages deep learning algorithms to automate arrhythmia classification. By employing advanced CNN architectures and multi-lead ECG data, our methodology offers a robust solution for precise and efficient arrhythmia detection. Through comprehensive evaluation, we demonstrate the effectiveness of our approach in facilitating more accurate clinical decision-making, thereby improving patient outcomes in managing cardiovascular arrhythmias.

[43]  arXiv:2404.15348 [pdf, ps, other]
Title: High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC
Comments: 14 pages, 11 figures
Subjects: Signal Processing (eess.SP); Optics (physics.optics)

This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexibility so that the linearity of PAM-4 output can be optimized with another degree of freedom. Each phase shift region is directly driven by the independently amplitude-tunable Non-Return-to-Zero (NRZ) signal. The three-segment modulator can achieve an adjustable wavelength range of approximately 0.037 nm within the high linearity PAM-4 output limit when the driving voltage varies from 1.5 V to 3 V, simultaneously achieving an adjustable insertion loss (IL) range of approximately 2 dB, roughly four times that of the two-segment MRM with a similar design. The driver circuit with adjustable driving voltage is co-designed to adjust the eye height to improve PAM-4 linearity. In this article, the high linearity PAM-4 silicon micro-ring architecture can be employed in optical transmitters to adjust PAM-4 eye-opening size and maximize the PAM-4 output linearity, thus offering the potential for high-performance and low-power overhead transmitters.

[44]  arXiv:2404.15349 [pdf, other]
Title: A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Comments: Multimodal Survey for Wearable Sensor-based Human Action Recognition
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Multimedia (cs.MM)

The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep learning approaches or on a single sensor modality. In real life, our human interact with the world in a multi-sensory way, where diverse information sources are intricately processed and interpreted to accomplish a complex and unified sensing system. To give machines similar intelligence, multimodal machine learning, which merges data from various sources, has become a popular research area with recent advancements. In this study, we present a comprehensive survey from a novel perspective on how to leverage multimodal learning to WSHAR domain for newcomers and researchers. We begin by presenting the recent sensor modalities as well as deep learning approaches in HAR. Subsequently, we explore the techniques used in present multimodal systems for WSHAR. This includes inter-multimodal systems which utilize sensor modalities from both visual and non-visual systems and intra-multimodal systems that simply take modalities from non-visual systems. After that, we focus on current multimodal learning approaches that have applied to solve some of the challenges existing in WSHAR. Specifically, we make extra efforts by connecting the existing multimodal literature from other domains, such as computer vision and natural language processing, with current WSHAR area. Finally, we identify the corresponding challenges and potential research direction in current WSHAR area for further improvement.

[45]  arXiv:2404.15350 [pdf, other]
Title: Evaluating Fast Adaptability of Neural Networks for Brain-Computer Interface
Comments: Accepted in IJCNN 2024
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Electroencephalography (EEG) classification is a versatile and portable technique for building non-invasive Brain-computer Interfaces (BCI). However, the classifiers that decode cognitive states from EEG brain data perform poorly when tested on newer domains, such as tasks or individuals absent during model training. Researchers have recently used complex strategies like Model-agnostic meta-learning (MAML) for domain adaptation. Nevertheless, there is a need for an evaluation strategy to evaluate the fast adaptability of the models, as this characteristic is essential for real-life BCI applications for quick calibration. We used motor movement and imaginary signals as input to Convolutional Neural Networks (CNN) based classifier for the experiments. Datasets with EEG signals typically have fewer examples and higher time resolution. Even though batch-normalization is preferred for Convolutional Neural Networks (CNN), we empirically show that layer-normalization can improve the adaptability of CNN-based EEG classifiers with not more than ten fine-tuning steps. In summary, the present work (i) proposes a simple strategy to evaluate fast adaptability, and (ii) empirically demonstrate fast adaptability across individuals as well as across tasks with simple transfer learning as compared to MAML approach.

[46]  arXiv:2404.15351 [pdf, other]
Title: Integrating Physiological Data with Large Language Models for Empathic Human-AI Interaction
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

This paper explores enhancing empathy in Large Language Models (LLMs) by integrating them with physiological data. We propose a physiological computing approach that includes developing deep learning models that use physiological data for recognizing psychological states and integrating the predicted states with LLMs for empathic interaction. We showcase the application of this approach in an Empathic LLM (EmLLM) chatbot for stress monitoring and control. We also discuss the results of a pilot study that evaluates this EmLLM chatbot based on its ability to accurately predict user stress, provide human-like responses, and assess the therapeutic alliance with the user.

[47]  arXiv:2404.15352 [pdf, other]
Title: TransfoRhythm: A Transformer Architecture Conductive to Blood Pressure Estimation via Solo PPG Signal Capturing
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recent statistics indicate that approximately 1.3 billion individuals worldwide suffer from hypertension, a leading cause of premature death globally. Blood pressure (BP) serves as a critical health indicator for accurate and timely diagnosis and/or treatment of hypertension. Driven by recent advancements in Artificial Intelligence (AI) and Deep Neural Networks (DNNs), there has been a surge of interest in developing data-driven and cuff-less BP estimation solutions. In this context, current literature predominantly focuses on coupling Electrocardiography (ECG) and Photoplethysmography (PPG) sensors, though this approach is constrained by reliance on multiple sensor types. An alternative, utilizing standalone PPG signals, presents challenges due to the absence of auxiliary sensors (ECG), requiring the use of morphological features while addressing motion artifacts and high-frequency noise. To address these issues, the paper introduces the TransfoRhythm framework, a Transformer-based DNN architecture built upon the recently released physiological database, MIMIC-IV. Leveraging Multi-Head Attention (MHA) mechanism, TransfoRhythm identifies dependencies and similarities across data segments, forming a robust framework for cuff-less BP estimation solely using PPG signals. To our knowledge, this paper represents the first study to apply the MIMIC IV dataset for cuff-less BP estimation, and TransfoRhythm is the first MHA-based model trained via MIMIC IV for BP prediction. Performance evaluation through comprehensive experiments demonstrates TransfoRhythm's superiority over its state-of-the-art counterparts. Specifically, TransfoRhythm achieves highly accurate results with Root Mean Square Error (RMSE) of [1.84, 1.42] and Mean Absolute Error (MAE) of [1.50, 1.17] for systolic and diastolic blood pressures, respectively.

[48]  arXiv:2404.15353 [pdf, other]
Title: SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.

[49]  arXiv:2404.15354 [pdf, other]
Title: Elevating Spectral GNNs through Enhanced Band-pass Filter Approximation
Comments: Preprint
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, previous poly-GNNs aim at achieving overall lower approximation error on different types of filters, e.g., low-pass and high-pass, but ignore a key question: \textit{which type of filter warrants greater attention for poly-GNNs?} In this paper, we first show that poly-GNN with a better approximation for band-pass graph filters performs better on graph learning tasks. This insight further sheds light on critical issues of existing poly-GNNs, i.e., those poly-GNNs achieve trivial performance in approximating band-pass graph filters, hindering the great potential of poly-GNNs. To tackle the issues, we propose a novel poly-GNN named TrigoNet. TrigoNet constructs different graph filters with novel trigonometric polynomial, and achieves leading performance in approximating band-pass graph filters against other polynomials. By applying Taylor expansion and deserting nonlinearity, TrigoNet achieves noticeable efficiency among baselines. Extensive experiments show the advantages of TrigoNet in both accuracy performances and efficiency.

[50]  arXiv:2404.15360 [pdf, other]
Title: Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning
Comments: 11 pages, 9 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Current electromyography (EMG) pattern recognition (PR) models have been shown to generalize poorly in unconstrained environments, setting back their adoption in applications such as hand gesture control. This problem is often due to limited training data, exacerbated by the use of supervised classification frameworks that are known to be suboptimal in such settings. In this work, we propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We use a Siamese Deep Convolutional Neural Network (SDCNN) and contrastive triplet loss to learn an EMG feature embedding space that captures the distribution of the different classes. A nearest-centroid approach is subsequently employed for inference, relying on how closely a test sample aligns with the established data distributions. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions, i.e. false positives, especially when operating beyond the training data domain. We show our approach's efficacy by testing the trained SDCNN's predictions and confidence estimations on unseen data, both in and out of the training domain. The evaluation metrics include the accuracy-rejection curve and the Kullback-Leibler divergence between the confidence distributions of accurate and inaccurate predictions. Outperforming comparable models on both metrics, our results demonstrate that the proposed meta-learning approach improves the classifier's precision in active decisions (after rejection), thus leading to better generalization and applicability.

[51]  arXiv:2404.15364 [pdf, other]
Title: MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers
Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP) neural networks that employ quantized low-precision fixed-point parameters for energy-efficient DPD. This approach reduces computational complexity and memory footprint, thereby lowering power consumption without compromising linearization efficacy. Applied to a 160MHz-BW 1024-QAM OFDM signal from a digital RF PA, MP-DPD gives no performance loss against 32-bit floating-point precision DPDs, while achieving -43.75 (L)/-45.27 (R) dBc in Adjacent Channel Power Ratio (ACPR) and -38.72 dB in Error Vector Magnitude (EVM). A 16-bit fixed-point-precision MP-DPD enables a 2.8X reduction in estimated inference power. The PyTorch learning and testing code is publicly available at \url{https://github.com/lab-emi/OpenDPD}.

[52]  arXiv:2404.15366 [pdf, other]
Title: A Weight-aware-based Multi-source Unsupervised Domain Adaptation Method for Human Motion Intention Recognition
Comments: 13 pages, 5 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Accurate recognition of human motion intention (HMI) is beneficial for exoskeleton robots to improve the wearing comfort level and achieve natural human-robot interaction. A classifier trained on labeled source subjects (domains) performs poorly on unlabeled target subject since the difference in individual motor characteristics. The unsupervised domain adaptation (UDA) method has become an effective way to this problem. However, the labeled data are collected from multiple source subjects that might be different not only from the target subject but also from each other. The current UDA methods for HMI recognition ignore the difference between each source subject, which reduces the classification accuracy. Therefore, this paper considers the differences between source subjects and develops a novel theory and algorithm for UDA to recognize HMI, where the margin disparity discrepancy (MDD) is extended to multi-source UDA theory and a novel weight-aware-based multi-source UDA algorithm (WMDD) is proposed. The source domain weight, which can be adjusted adaptively by the MDD between each source subject and target subject, is incorporated into UDA to measure the differences between source subjects. The developed multi-source UDA theory is theoretical and the generalization error on target subject is guaranteed. The theory can be transformed into an optimization problem for UDA, successfully bridging the gap between theory and algorithm. Moreover, a lightweight network is employed to guarantee the real-time of classification and the adversarial learning between feature generator and ensemble classifiers is utilized to further improve the generalization ability. The extensive experiments verify theoretical analysis and show that WMDD outperforms previous UDA methods on HMI recognition tasks.

[53]  arXiv:2404.15367 [pdf, other]
Title: Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This study explores graph representations of ECG signals using Visibility Graph (VG) and Vector Visibility Graph (VVG), coupled with Graph Convolutional Networks (GCNs) for arrhythmia classification. Through experiments on the MIT-BIH dataset, we investigated various GCN architectures and preprocessing parameters. The results reveal that GCNs, when integrated with VG and VVG for signal graph mapping, can classify arrhythmias without the need for preprocessing or noise removal from ECG signals. While both VG and VVG methods show promise, VG is notably more efficient. The proposed approach was competitive compared to baseline methods, although classifying the S class remains challenging, especially under the inter-patient paradigm. Computational complexity, particularly with the VVG method, required data balancing and sophisticated implementation strategies. The source code is publicly available for further research and development at https://github.com/raffoliveira/VG_for_arrhythmia_classification_with_GCN.

[54]  arXiv:2404.15368 [pdf, other]
Title: Unmasking the Role of Remote Sensors in Comfort, Energy and Demand Response
Comments: 13 Figures, 8 Tables, 25 Pages. Submitted to Data-Centric Engineering Journal and it is under review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)

In single-zone multi-room houses (SZMRHs), temperature controls rely on a single probe near the thermostat, resulting in temperature discrepancies that cause thermal discomfort and energy waste. Augmenting smart thermostats (STs) with per-room sensors has gained acceptance by major ST manufacturers. This paper leverages additional sensory information to empirically characterize the services provided by buildings, including thermal comfort, energy efficiency, and demand response (DR). Utilizing room-level time-series data from 1,000 houses, metadata from 110,000 houses across the United States, and data from two real-world testbeds, we examine the limitations of SZMRHs and explore the potential of remote sensors. We discovered that comfortable DR durations (CDRDs) for rooms are typically 70% longer or 40% shorter than for the room with the thermostat. When averaging, rooms at the control temperature's bounds are typically deviated around -3{\deg}F to 2.5{\deg}F from the average. Moreover, in 95\% of houses, we identified rooms experiencing notably higher solar gains compared to the rest of the rooms, while 85% and 70% of houses demonstrated lower heat input and poor insulation, respectively. Lastly, it became evident that the consumption of cooling energy escalates with the increase in the number of sensors, whereas heating usage experiences fluctuations ranging from -19% to +25% This study serves as a benchmark for assessing the thermal comfort and DR services in the existing housing stock, while also highlighting the energy efficiency impacts of sensing technologies. Our approach sets the stage for more granular, precise control strategies of SZMRHs.

[55]  arXiv:2404.15370 [pdf, other]
Title: Self-Supervised Learning for User Localization
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Machine learning techniques have shown remarkable accuracy in localization tasks, but their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck. Self-supervised learning techniques alleviate the need for labeled data, a potential that remains largely untapped and underexplored in existing research. Addressing this gap, we propose a pioneering approach that leverages self-supervised pretraining on unlabeled data to boost the performance of supervised learning for user localization based on CSI. We introduce two pretraining Auto Encoder (AE) models employing Multi Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) to glean representations from unlabeled data via self-supervised learning. Following this, we utilize the encoder portion of the AE models to extract relevant features from labeled data, and finetune an MLP-based Position Estimation Model to accurately deduce user locations. Our experimentation on the CTW-2020 dataset, which features a substantial volume of unlabeled data but limited labeled samples, demonstrates the viability of our approach. Notably, the dataset covers a vast area spanning over 646x943x41 meters, and our approach demonstrates promising results even for such expansive localization tasks.

[56]  arXiv:2404.15371 [pdf, other]
Title: Efficient Verification of a RADAR SoC Using Formal and Simulation-Based Methods
Comments: Published in DVCon Europe 2023
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

As the demand for Internet of Things (IoT) and Human-to-Machine Interaction (HMI) increases, modern System-on-Chips (SoCs) offering such solutions are becoming increasingly complex. This intricate design poses significant challenges for verification, particularly when time-to-market is a crucial factor for consumer electronics products. This paper presents a case study based on our work to verify a complex Radio Detection And Ranging (RADAR) based SoC that performs on-chip sensing of human motion with millimetre accuracy. We leverage both formal and simulation-based methods to complement each other and achieve verification sign-off with high confidence. While employing a requirements-driven flow approach, we demonstrate the use of different verification methods to cater to multiple requirements and highlight our know-how from the project. Additionally, we used Machine Learning (ML) based methods, specifically the Xcelium ML tool from Cadence, to improve verification throughput.

[57]  arXiv:2404.15373 [pdf, other]
Title: Robust EEG-based Emotion Recognition Using an Inception and Two-sided Perturbation Model
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automated emotion recognition using electroencephalogram (EEG) signals has gained substantial attention. Although deep learning approaches exhibit strong performance, they often suffer from vulnerabilities to various perturbations, like environmental noise and adversarial attacks. In this paper, we propose an Inception feature generator and two-sided perturbation (INC-TSP) approach to enhance emotion recognition in brain-computer interfaces. INC-TSP integrates the Inception module for EEG data analysis and employs two-sided perturbation (TSP) as a defensive mechanism against input perturbations. TSP introduces worst-case perturbations to the model's weights and inputs, reinforcing the model's elasticity against adversarial attacks. The proposed approach addresses the challenge of maintaining accurate emotion recognition in the presence of input uncertainties. We validate INC-TSP in a subject-independent three-class emotion recognition scenario, demonstrating robust performance.

[58]  arXiv:2404.15374 [pdf, other]
Title: Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning
Comments: This paper has been accepted for the publication in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2402.09580
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features.

[59]  arXiv:2404.15375 [pdf, other]
Title: MIMO Multipath-based SLAM for Non-Ideal Reflective Surfaces
Comments: 8 pages, 4 figures
Subjects: Signal Processing (eess.SP)

Multipath-based simultaneous localization and mapping (MP-SLAM) is a well established approach to obtain position information of transmitters and receivers as well as information regarding the propagation environments in future multiple input multiple output (MIMO) communication systems. Conventional methods for MP-SLAM consider specular reflections of the radio signals occurring at smooth, flat surfaces, which are modeled by virtual anchors (VAs) that are mirror images of the physical anchors (PAs), with each VA generating a single multipath component (MPC). However, non-ideal reflective surfaces (such as walls covered by shelves or cupboards) cause dispersion effects that violate the VA model and lead to multiple MPCs that are associated to a single VA. In this paper, we introduce a Bayesian particle-based sum-product algorithm (SPA) for MP-SLAM in MIMO communications systems. Our method considers non-ideal reflective surfaces by jointly estimating the parameters of individual dispersion models for each detected surface in delay and angle domain leveraging multiple-measurement-to-feature data association. We demonstrate that the proposed SLAM method can robustly and jointly estimate the positions and dispersion extents of ideal and non-ideal reflective surfaces using numerical simulation.

[60]  arXiv:2404.15394 [pdf, ps, other]
Title: On Generating Cancelable Biometric Template using Reverse of Boolean XOR
Authors: Manisha, Nitin Kumar
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Cancelable Biometric is repetitive distortion embedded in original Biometric image for keeping it secure from unauthorized access. In this paper, we have generated Cancelable Biometric templates with Reverse Boolean XOR technique. Three different methods have been proposed for generation of Cancelable Biometric templates based on Visual Secret Sharing scheme. In each method, one Secret image and n-1 Cover images are used as: (M1) One original Biometric image (Secret) with n- 1 randomly chosen Gray Cover images (M2) One original Secret image with n-1 Cover images, which are Randomly Permuted version of the original Secret image (M3) One Secret image with n-1 Cover images, both Secret image and Cover images are Randomly Permuted version of original Biometric image. Experiment works have performed on publicly available ORL Face database and IIT Delhi Iris database. The performance of the proposed methods is compared in terms of Co-relation Coefficient (Cr), Mean Square Error (MSE), Mean Absolute Error (MAE), Structural Similarity (SSIM), Peak Signal to Noise Ratio (PSNR), Number of Pixel Change Rate (NPCR), and Unified Average Changing Intensity (UACI). It is found that among the three proposed method, M3 generates good quality Cancelable templates and gives best performance in terms of quality. M3 is also better in quantitative terms on ORL dataset while M2 and M3 are comparable on IIT Delhi Iris dataset.

[61]  arXiv:2404.15508 [pdf, other]
Title: Joint Soil and Above-Ground Biomass Characterization Using Radars
Subjects: Signal Processing (eess.SP)

Soil moisture sensing through biomass or vegetation canopy has challenged researchers, even those who use SAR sensors with penetration capabilities. This is mainly due to the imposed extra time and phase offsets on Radio Frequency (RF) signals as they travel through the canopy. These offsets depend on the vegetation canopy moisture and height, both of which are typically unknown in agricultural and forest fields. In this paper, we leverage the mobility of an unmanned aerial system (UAS) to collect spatially-diverse radar measurements, enabling the joint estimation of soil moisture, above-ground biomass moisture, and biomass height, all without assuming any calibration steps. We leverage the changes in time-of-flight (ToF) and angle-of-arrival (AoA) measurements of reflected radar signals as the UAS flies above a reflector buried under the soil. We demonstrate the effectiveness of our algorithm by simulating its performance under realistic measurement noises as well as conducting lab experiments with different types of above-ground biomass. Our simulation results conclude that our algorithm is capable of estimating volumetric soil moisture to less than 1% median absolute error (MAE), vegetation height to 11.1cm MAE, and vegetation relative permittivity to 0.32 MAE. Our experimental results demonstrate the effectiveness of the proposed method in practical scenarios for varying biomass moistures and heights.

[62]  arXiv:2404.15512 [pdf, other]
Title: Deep Hankel matrices with random elements
Comments: L4DC 2024
Subjects: Systems and Control (eess.SY)

Willems' fundamental lemma enables a trajectory-based characterization of linear systems through data-based Hankel matrices. However, in the presence of measurement noise, we ask: Is this noisy Hankel-based model expressive enough to re-identify itself? In other words, we study the output prediction accuracy from recursively applying the same persistently exciting input sequence to the model. We find an asymptotic connection to this self-consistency question in terms of the amount of data. More importantly, we also connect this question to the depth (number of rows) of the Hankel model, showing the simple act of reconfiguring a finite dataset significantly improves accuracy. We apply these insights to find a parsimonious depth for LQR problems over the trajectory space.

[63]  arXiv:2404.15533 [pdf, other]
Title: Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date
Subjects: Systems and Control (eess.SY)

Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIRCLES) Consortium conducted MegaVanderTest (MVT), a live traffic control experiment involving 100 vehicles near Nashville, TN, USA. This article is a tutorial for developing analytical and simulation-based tools essential for designing and executing a live traffic control experiment like the MVT. It presents an overview of the proposed roadmap and various procedures used in designing, monitoring, and conducting the MVT, which is the largest mobile traffic control experiment at the time. The design process is aimed at evaluating the impact of the CIRCLES AVs on surrounding traffic. The article discusses the agent-based traffic simulation framework created for this evaluation. A novel methodological framework is introduced to calibrate this microsimulation, aiming to accurately capture traffic dynamics and assess the impact of adding 100 vehicles to existing traffic. The calibration model's effectiveness is verified using data from a six-mile section of Nashville's I-24 highway. The results indicate that the proposed model establishes an effective feedback loop between the optimizer and the simulator, thereby calibrating flow and speed with different spatiotemporal characteristics to minimize the error between simulated and real-world data. Finally, We simulate AVs in multiple scenarios to assess their effect on traffic congestion. This evaluation validates the AV routes, thereby contributing to the execution of a safe and successful live traffic control experiment via AVs.

[64]  arXiv:2404.15584 [pdf, ps, other]
Title: Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty
Comments: systems optimization, robust optimization, local control
Subjects: Systems and Control (eess.SY)

As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to address these issues. Using historical data and deep learning classification methods, the proposed method simulates optimal system behaviour without requiring communication infrastructure. The simulation results verify that the proposed method effectively controls the voltage and current amplitude while minimizing the operational cost and three-phase imbalance within acceptable limits. The proposed method shows promise for managing uncertainties and optimizing performance in low-voltage distribution networks.

[65]  arXiv:2404.15609 [pdf, other]
Title: Dynamic fault detection and diagnosis for alkaline water electrolyzer with variational Bayesian Sparse principal component analysis
Journal-ref: Journal of Process Control, 135:103173, March 2024. ISSN 0959-1524
Subjects: Systems and Control (eess.SY)

Electrolytic hydrogen production serves as not only a vital source of green hydrogen but also a key strategy for addressing renewable energy consumption challenges. For the safe production of hydrogen through alkaline water electrolyzer (AWE), dependable process monitoring technology is essential. However, random noise can easily contaminate the AWE process data collected in industrial settings, presenting new challenges for monitoring methods. In this study, we develop the variational Bayesian sparse principal component analysis (VBSPCA) method for process monitoring. VBSPCA methods based on Gaussian prior and Laplace prior are derived to obtain the sparsity of the projection matrix, which corresponds to $\ell_2$ regularization and $\ell_1$ regularization, respectively. The correlation of dynamic latent variables is then analyzed by sparse autoregression and fault variables are diagnosed by fault reconstruction. The effectiveness of the method is verified by an industrial hydrogen production process, and the test results demonstrated that both Gaussian prior and Laplace prior based VBSPCA can effectively detect and diagnose critical faults in AWEs.

[66]  arXiv:2404.15620 [pdf, other]
Title: A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Subjects: Image and Video Processing (eess.IV)

Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can adaptively learn dynamic kernel priors to realize real-time kernel estimation, and thereby enables superior HR image restoration performances. This is achieved by a Markov chain Monte Carlo sampling process on random kernel distributions. The learned kernel prior is then assigned to optimize a blur kernel estimation network, which entails a network-based Langevin dynamic optimization strategy. These two techniques ensure the accuracy of the kernel estimation. DKP can be easily used to replace the kernel estimation models in the existing methods, such as Double-DIP and FKP-DIP, or be added to the off-the-shelf image restoration model, such as diffusion model. In this paper, we incorporate our DKP model with DIP and diffusion model, referring to DIP-DKP and Diff-DKP, for validations. Extensive simulations on Gaussian and motion kernel scenarios demonstrate that the proposed DKP model can significantly improve the kernel estimation with comparable runtime and memory usage, leading to state-of-the-art BSR results. The code is available at https://github.com/XYLGroup/DKP.

[67]  arXiv:2404.15718 [pdf, other]
Title: Mitigating False Predictions In Unreasonable Body Regions
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Despite considerable strides in developing deep learning models for 3D medical image segmentation, the challenge of effectively generalizing across diverse image distributions persists. While domain generalization is acknowledged as vital for robust application in clinical settings, the challenges stemming from training with a limited Field of View (FOV) remain unaddressed. This limitation leads to false predictions when applied to body regions beyond the FOV of the training data. In response to this problem, we propose a novel loss function that penalizes predictions in implausible body regions, applicable in both single-dataset and multi-dataset training schemes. It is realized with a Body Part Regression model that generates axial slice positional scores. Through comprehensive evaluation using a test set featuring varying FOVs, our approach demonstrates remarkable improvements in generalization capabilities. It effectively mitigates false positive tumor predictions up to 85% and significantly enhances overall segmentation performance.

[68]  arXiv:2404.15750 [pdf, other]
Title: A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems
Comments: 14 pages, 9 figures, Accepted by IEEE TWC
Subjects: Signal Processing (eess.SP)

Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-plus-noise ratio for radar sensing. Considering the non-convexity of this problem arising from multiplicative coupling of the analog and digital beamforming, we convert the sum-rate maximization into an equivalent weighted mean-square error minimization and apply penalty dual decomposition to decouple the analog and digital beamforming. Specifically, a second-order cone program is first constructed to optimize the fully digital counterpart of the HAD beamforming. Then, the sparsity of the RS architecture is exploited to obtain a low-complexity solution for the HAD beamforming. The convergence and complexity analyses of our algorithm are carried out under the RS architecture. Simulations corroborate that, with the RS architecture, DFRC offers effective communication and sensing and improves energy efficiency by 83.4% and 114.2% with a moderate number of radio frequency chains and phase shifters, compared to the persistently- and fullyconnected architectures, respectively.

[69]  arXiv:2404.15761 [pdf, other]
Title: Rechargeable UAV Trajectory Optimization for Real-Time Persistent Data Collection of Large-Scale Sensor Networks
Comments: 6 pages, 6 figures, accepted by IEEE ICC Workshops 2024
Subjects: Signal Processing (eess.SP)

Continuous real-time data collection in wireless sensor networks is crucial for facilitating timely decision-making and environmental monitoring. Unmanned aerial vehicles (UAVs) have received plenty of attention for collecting data efficiently due to their high flexibility and enhanced communication ability, nonetheless, the limited onboard energy restricts UAVs' application on persistent missions, such as disaster search and rescue. In this paper, we propose a rechargeable UAV-assisted periodic data collection scheme, where the UAV replenishes energy through the wireless charging platform during the mission to provide persistent information services for the sensor nodes (SNs). Specifically, the total completion time is minimized by optimizing the trajectory of the UAV to reach the balance among the collecting time, flight time, and recharging time. However, optimally solving this problem is highly non-trivial due to the non-convex constraints and the involved integer variables. To address this issue, the formulated problem is decomposed into two subproblems, namely, UAV data collection trajectory optimization and SN clustering and UAV visiting order optimization. By exploiting the convex optimization techniques and proving the total time is non-decreasing with the cluster number, a periodic trajectory optimization algorithm based on successive convex approximation (SCA) and bisection search is proposed to solve the main problem. The simulation results show the efficiency of the proposed scheme in practical scenarios and the completion time of the proposed algorithm is on average 39% and 33% lower than the two benchmarks, respectively.

[70]  arXiv:2404.15786 [pdf, other]
Title: Rethinking Model Prototyping through the MedMNIST+ Dataset Collection
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.

[71]  arXiv:2404.15796 [pdf, ps, other]
Title: Achievements, Experiences, and Lessons Learned from the European Research Infrastructure ERIGrid related to the Validation of Power and Energy Systems
Journal-ref: e & i Elektrotechnik und Informationstechnik, 2020
Subjects: Systems and Control (eess.SY)

Power system operation is of vital importance and must be developed far beyond today's practice to meet future needs. Almost all European countries are facing an abrupt and very important increase of renewables with intrinsically varying yields which are difficult to predict. In addition, an increase of new types of electric loads and a reduction of traditional production from bulk generation can be observed as well. Hence, the level of complexity of system operation steadily increases. Because of these developments, the traditional power system is being transformed into a smart grid. Previous and ongoing research has tended to focus on how specific aspects of smart grids can be developed and validated, but until now there exists no integrated approach for analysing and evaluating complex smart grid configurations. To tackle these research and development needs, a pan-European research infrastructure is realized in the ERIGrid project that supports the technology development as well as the roll out of smart grid technologies and solutions. This paper provides an overview of the main results of ERIGrid which have been achieved during the last four years. Also, experiences and lessons learned are discussed and an outlook to future research needs is provided.

[72]  arXiv:2404.15798 [pdf, ps, other]
Title: Recent Activities of a European Union Joint Research Project on Metrology for Emerging Wireless Standards
Comments: 6 pages, 10 figures, the 45th Annual Meeting and Symposium of the Antenna Measurement Techniques Association (AMTA 2023)
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

Emerging wireless technologies with Gbps connectivity, such as the 5th generation (5G) and 6th generation (6G) of mobile networks, require improved and substantiating documentation for the wireless standards concerning the radio signals, systems, transmission environments used, and the radio frequency exposures created. Current challenges faced by the telecommunications sector include the lack of accurate, fast, low-cost, and traceable methods for manufacturers to demonstrate 5G and 6G product verifications matching customer specifications. This paper gives an update on the recent research and development activities from an EU Joint Research Project entitled metrology for emerging wireless standards (MEWS) in support of the above.

[73]  arXiv:2404.15830 [pdf, other]
Title: SNR Maximization and Localization for UAV-IRS-Assisted Near-Field Systems
Comments: 5 pages, 3 figures
Subjects: Signal Processing (eess.SP)

This letter introduces a novel unmanned aerial vehicle (UAV)-intelligent reflecting surface (IRS) structure into near-field localization systems to enhance the design flexibility of IRS, thereby obtaining additional performance gains. Specifically, a UAV-IRS is utilized to improve the harsh wireless environment and provide localization possibilities. To improve the localization accuracy, a joint optimization problem considering UAV position and UAV-IRS passive beamforming is formulated to maximize the receiving signal-to-noise ratio (SNR). An alternative optimization algorithm is proposed to solve the complex non-convex problem leveraging the projected gradient ascent (PGA) algorithm and the principle of minimizing the phase difference of the receiving signals. Closed-form expressions for UAV-IRS phase shift are derived to reduce the algorithm complexity. In the simulations, the proposed algorithm is compared with three different schemes and outperforms the others in both receiving SNR and localization accuracy.

[74]  arXiv:2404.15880 [pdf, other]
Title: Machine Learning for Pre/Post Flight UAV Rotor Defect Detection Using Vibration Analysis
Comments: Submitted to IEEE GlobeCom 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Unmanned Aerial Vehicles (UAVs) will be critical infrastructural components of future smart cities. In order to operate efficiently, UAV reliability must be ensured by constant monitoring for faults and failures. To this end, the work presented in this paper leverages signal processing and Machine Learning (ML) methods to analyze the data of a comprehensive vibrational analysis to determine the presence of rotor blade defects during pre and post-flight operation. With the help of dimensionality reduction techniques, the Random Forest algorithm exhibited the best performance and detected defective rotor blades perfectly. Additionally, a comprehensive analysis of the impact of various feature subsets is presented to gain insight into the factors affecting the model's classification decision process.

[75]  arXiv:2404.15918 [pdf, other]
Title: Perception and Localization of Macular Degeneration Applying Convolutional Neural Network, ResNet and Grad-CAM
Comments: 12 pages, 5 figures, 2 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

A well-known retinal disease that feels blurry visions to the affected patients is Macular Degeneration. This research is based on classifying the healthy and macular degeneration fundus with localizing the affected region of the fundus. A CNN architecture and CNN with ResNet architecture (ResNet50, ResNet50v2, ResNet101, ResNet101v2, ResNet152, ResNet152v2) as the backbone are used to classify the two types of fundus. The data are split into three categories including (a) Training set is 90% and Testing set is 10% (b) Training set is 80% and Testing set is 20%, (c) Training set is 50% and Testing set is 50%. After the training, the best model has been selected from the evaluation metrics. Among the models, CNN with backbone of ResNet50 performs best which gives the training accuracy of 98.7\% for 90\% train and 10\% test data split. With this model, we have performed the Grad-CAM visualization to get the region of affected area of fundus.

[76]  arXiv:2404.15936 [pdf, other]
Title: Accurate Direct Positioning in Distributed MIMO Using Delay-Doppler Channel Measurements
Comments: 5 pages, 3 figures, submitted to IEEE SPAWC 2024
Subjects: Signal Processing (eess.SP)

Distributed multiple-input multiple-output (D-MIMO) is a promising technology for simultaneous communication and positioning. However, phase synchronization between multiple access points in D-MIMO is challenging, which requires methods that function without the need for phase synchronization. We therefore present a method for D-MIMO that performs direct positioning of a moving device based on the delay-Doppler characteristics of the channel state information (CSI). Our method relies on particle-filter-based Bayesian inference with a state-space model. We use recent measurements from a sub-6 GHz D-MIMO OFDM system in an industrial environment to demonstrate centimeter accuracy under partial line-of-sight (LoS) conditions and decimeter accuracy under full non-LoS.

[77]  arXiv:2404.15951 [pdf, other]
Title: Input-Output Specifications of Grid-Forming Functions and Data-Driven Verification Methods
Subjects: Systems and Control (eess.SY)

This work investigates interoperability and performance specifications for converter interfaced generation (CIG) that can be verified using only input-output data. First, we develop decentralized conditions on frequency stability that account for network circuit dynamics and can be verified using CIG terminal dynamics and a few key network parameters. Next, we formalize performance specifications that impose requirements on the CIG disturbance response. A simple data-driven validation method is presented that enables verification of the interoperability and performance specifications for CIG using input-output data from a two-node system. Data obtained from electromagnetic transient (EMT) simulations are used to illustrate the proposed approach and the impact of key parameters such as inner control loop gains, network coupling strength, and controller bandwidth limitations.

[78]  arXiv:2404.15958 [pdf, other]
Title: Platooning of Heterogeneous Vehicles with Actuation Delays: Theoretical and Experimental Results
Subjects: Systems and Control (eess.SY)

In this paper we present a prediction-based Cooperative Adaptive Cruise Controller for vehicles with actuation delay, applicable within heterogeneous platoons. We provide a stability analysis for the discrete-time implementation of this controller, which shows the effect of the used sampling times and can be used for selecting appropriate controller gains. The theoretical results are validated by means of experiments using full scale vehicles. This is an extended version of a paper with the same title (submitted to IFAC TDS 2024). Additional mathematical details are provided in this extended version.

[79]  arXiv:2404.15961 [pdf, other]
Title: Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step in this direction, we perform an extensive field survey with a tractor mounted SFCW GPR instrument. Using ML data processing we test the GPR instrument's capabilities to predict the apparent electrical conductivity (ECaR) as measured by a simultaneously recording Electromagnetic Induction (EMI) instrument. The large-scale field measurement campaign with 3472 co-registered and geo-located GPR and EMI data samples distributed over ~6600 square meters was performed on a golf course. The selected terrain benefits from a high surface homogeneity, but also features the challenge of only small, and hence hard to discern, variations in the measured soil parameter. Based on the quantitative results we suggest the use of nugget-to-sill ratio as a performance metric for the evaluation of end-to-end ML performance in the agricultural setting and discuss the limiting factors in the multi-sensor regression setting. The code is released as open source and available at https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped-frequency-gpr.

[80]  arXiv:2404.15962 [pdf, other]
Title: Showcasing Automated Vehicle Prototypes: A Collaborative Release Process to Manage and Communicate Risk
Comments: submitted for publication
Subjects: Systems and Control (eess.SY)

The development and deployment of automated vehicles pose major challenges for manufacturers to this day. Whilst central questions, like the issue of ensuring a sufficient level of safety, remain unanswered, prototypes are increasingly finding their way into public traffic in urban areas. Although safety concepts for prototypes are addressed in literature, published work hardly contains any dedicated considerations on a systematic release for their operation. In this paper, we propose an incremental release process for public demonstrations of prototypes' automated driving functionality. We explicate release process requirements, derive process design decisions, and define stakeholder tasks. Furthermore, we reflect on practical insights gained through implementing the release process as part of the UNICAR$agil$ research project, in which four prototypes based on novel vehicle concepts were built and demonstrated to the public. One observation is the improved quality of internal risk communication, achieved by dismantling information asymmetries between stakeholders. Design conflicts are disclosed - providing a contribution to nurture transparency and, thereby, supporting a valid basis for release decisions. We argue that our release process meets two important requirements, as the results suggest its applicability to the domain of automated driving and its scalability to different vehicle concepts and organizational structures.

[81]  arXiv:2404.15978 [pdf, other]
Title: Learning deep Koopman operators with convex stability constraints
Comments: 7 pages, 3 figures, 1 table, submitted to IEEE Conference on Decision and Control (CDC) 2024
Subjects: Systems and Control (eess.SY)

In this paper, we present a novel sufficient condition for the stability of discrete-time linear systems that can be represented as a set of piecewise linear constraints, which make them suitable for quadratic programming optimization problems. More specifically, we tackle the problem of imposing asymptotic stability to a Koopman matrix learned from data during iterative gradient descent optimization processes. We show that this sufficient condition can be decoupled by rows of the system matrix, and propose a control barrier function-based projected gradient descent to enforce gradual evolution towards the stability set by running an optimization-in-the-loop during the iterative learning process. We compare the performance of our algorithm with other two recent approaches in the literature, and show that we get close to state-of-the-art performance while providing the added flexibility of allowing the optimization problem to be further customized for specific applications.

Cross-lists for Thu, 25 Apr 24

[82]  arXiv:2404.04879 (cross-list from cs.RO) [pdf, other]
Title: Multi-Type Map Construction via Semantics-Aware Autonomous Exploration in Unknown Indoor Environments
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper proposes a novel semantics-aware autonomous exploration model to handle the long-standing issue: the mainstream RRT (Rapid-exploration Random Tree) based exploration models usually make the mobile robot switch frequently between different regions, leading to the excessively-repeated explorations for the same region. Our proposed semantics-aware model encourages a mobile robot to fully explore the current region before moving to the next region, which is able to avoid excessively-repeated explorations and make the exploration faster. The core idea of semantics-aware autonomous exploration model is optimizing the sampling point selection mechanism and frontier point evaluation function by considering the semantic information of regions. In addition, compared with existing autonomous exploration methods that usually construct the single-type or 2-3 types of maps, our model allows to construct four kinds of maps including point cloud map, occupancy grid map, topological map, and semantic map. To test the performance of our model, we conducted experiments in three simulated environments. The experiment results demonstrate that compared to Improved RRT, our model achieved 33.0% exploration time reduction and 39.3% exploration trajectory length reduction when maintaining >98% exploration rate.

[83]  arXiv:2404.15296 (cross-list from math.NA) [pdf, other]
Title: Maximum Discrepancy Generative Regularization and Non-Negative Matrix Factorization for Single Channel Source Separation
Comments: arXiv admin note: substantial text overlap with arXiv:2305.01758
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

The idea of adversarial learning of regularization functionals has recently been introduced in the wider context of inverse problems. The intuition behind this method is the realization that it is not only necessary to learn the basic features that make up a class of signals one wants to represent, but also, or even more so, which features to avoid in the representation. In this paper, we will apply this approach to the training of generative models, leading to what we call Maximum Discrepancy Generative Regularization. In particular, we apply this to problem of source separation by means of Non-negative Matrix Factorization (NMF) and present a new method for the adversarial training of NMF bases. We show in numerical experiments, both for image and audio separation, that this leads to a clear improvement of the reconstructed signals, in particular in the case where little or no strong supervision data is available.

[84]  arXiv:2404.15446 (cross-list from cs.CR) [pdf, other]
Title: OffRAMPS: An FPGA-based Intermediary for Analysis and Modification of Additive Manufacturing Control Systems
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Cybersecurity threats in Additive Manufacturing (AM) are an increasing concern as AM adoption continues to grow. AM is now being used for parts in the aerospace, transportation, and medical domains. Threat vectors which allow for part compromise are particularly concerning, as any failure in these domains would have life-threatening consequences. A major challenge to investigation of AM part-compromises comes from the difficulty in evaluating and benchmarking both identified threat vectors as well as methods for detecting adversarial actions. In this work, we introduce a generalized platform for systematic analysis of attacks against and defenses for 3D printers. Our "OFFRAMPS" platform is based on the open-source 3D printer control board "RAMPS." OFFRAMPS allows analysis, recording, and modification of all control signals and I/O for a 3D printer. We show the efficacy of OFFRAMPS by presenting a series of case studies based on several Trojans, including ones identified in the literature, and show that OFFRAMPS can both emulate and detect these attacks, i.e., it can both change and detect arbitrary changes to the g-code print commands.

[85]  arXiv:2404.15469 (cross-list from cs.IT) [pdf, other]
Title: NMBEnet: Efficient Near-field mmWave Beam Training for Multiuser OFDM Systems Using Sub-6 GHz Pilots
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Combining millimetre-wave (mmWave) communications with an extremely large-scale antenna array (ELAA) presents a promising avenue for meeting the spectral efficiency demands of the future sixth generation (6G) mobile communications. However, beam training for mmWave ELAA systems is challenged by excessive pilot overheads as well as insufficient accuracy, as the huge near-field codebook has to be accounted for. In this paper, inspired by the similarity between far-field sub-6 GHz channels and near-field mmWave channels, we propose to leverage sub-6 GHz uplink pilot signals to directly estimate the optimal near-field mmWave codeword, which aims to reduce pilot overhead and bypass the channel estimation. Moreover, we adopt deep learning to perform this dual mapping function, i.e., sub-6 GHz to mmWave, far-field to near-field, and a novel neural network structure called NMBEnet is designed to enhance the precision of beam training. Specifically, when considering the orthogonal frequency division multiplexing (OFDM) communication scenarios with high user density, correlations arise both between signals from different users and between signals from different subcarriers. Accordingly, the convolutional neural network (CNN) module and graph neural network (GNN) module included in the proposed NMBEnet can leverage these two correlations to further enhance the precision of beam training.

[86]  arXiv:2404.15530 (cross-list from cs.IT) [pdf, other]
Title: Co-existing/Cooperating Multicell Massive MIMO and Cell-Free Massive MIMO Deployments: Heuristic Designs and Performance Analysis
Comments: Paper submitted to the IEEE Open Journal of the Communications Society
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Cell-free massive MIMO (CF-mMIMO) represent a deeply investigated evolution from the conventional multicell co-located massive MIMO (MC-mMIMO) network deployments. Anticipating a gradual integration of CF-mMIMO systems alongside pre-existing MC-mMIMO network elements, this paper considers a scenario where both deployments coexist, in order to serve a large number of users using a shared set of frequencies. The investigation explores the impact of this coexistence on the network's downlink performance, considering various degrees of mutual cooperation, precoder selection, and power control strategies. Moreover, to take into account the effect of the proposed cooperation scenarios on the fronthaul links, this paper also provides a fronthaul-aware heuristic association algorithm between users and network elements, which permits fulfilling the fronthaul requirement on each link. The research is finally completed by extensive simulations, shedding light on the performance outcomes associated with the diverse cooperation levels and several solutions delineated in the paper.

[87]  arXiv:2404.15575 (cross-list from astro-ph.IM) [pdf, other]
Title: Jitter Characterization of the HyTI Satellite
Comments: Accepted for the 2024 IEEE Aerospace Conference Proceedings
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Systems and Control (eess.SY)

The Hyperspectral Thermal Imager (HyTI) is a technology demonstration mission that will obtain high spatial, spectral, and temporal resolution long-wave infrared images of Earth's surface from a 6U cubesat. HyTI science requires that the pointing accuracy of the optical axis shall not exceed 2.89 arcsec over the 0.5 ms integration time due to microvibration effects (known as jitter). Two sources of vibration are a cryocooler that is added to maintain the detector at 68 K and three orthogonally placed reaction wheels that are a part of the attitude control system. Both of these parts will introduce vibrations that are propagated through to the satellite structure while imaging. Typical methods of characterizing and measuring jitter involve complex finite element methods and specialized equipment and setups. In this paper, we describe a novel method of characterizing jitter for small satellite systems that is low-cost and minimally modifies the subject's mass distribution. The metrology instrument is comprised of a laser source, a small mirror mounted via a 3D printed clamp to a jig, and a lateral effect position-sensing detector. The position-sensing detector samples 1000 Hz and can measure displacements as little as 0.15 arcsec at distances of one meter. This paper provides an experimental procedure that incrementally analyzes vibratory sources to establish causal relationships between sources and the vibratory modes they create. We demonstrate the capabilities of this metrology system and testing procedure on HyTI in the Hawaii Space Flight Lab's clean room. Results include power spectral density plots that show fundamental and higher-order vibratory modal frequencies. Results from metrology show that jitter from reaction wheels meets HyTI system requirements within 3$\sigma$.

[88]  arXiv:2404.15585 (cross-list from cs.LG) [pdf, other]
Title: Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine learning architecture, has shown promising performance in balancing data privacy and model utility by keeping private data on the client's side and using a central server to coordinate a set of clients for model training through aggregating their uploaded model parameters. Yet, this architecture heavily relies on a trusted third-party server, which is challenging to achieve in real life. Swarm learning, as a specialized decentralized federated learning architecture that does not require a central server, utilizes blockchain technology to enable direct parameter exchanges between clients. However, the mining of blocks requires significant computational resources, limiting its scalability. To address this issue, this paper integrates the brain storm optimization algorithm into the swarm learning framework, named BSO-SL. This approach clusters similar clients into different groups based on their model distributions. Additionally, leveraging the architecture of BSO, clients are given the probability to engage in collaborative learning both within their cluster and with clients outside their cluster, preventing the model from converging to local optima. The proposed method has been validated on a real-world diabetic retinopathy image classification dataset, and the experimental results demonstrate the effectiveness of the proposed approach.

[89]  arXiv:2404.15591 (cross-list from cs.CV) [pdf, other]
Title: Domain Adaptation for Learned Image Compression with Supervised Adapters
Comments: 10 pages, published to Data compression conference 2024 (DCC2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

In Learned Image Compression (LIC), a model is trained at encoding and decoding images sampled from a source domain, often outperforming traditional codecs on natural images; yet its performance may be far from optimal on images sampled from different domains. In this work, we tackle the problem of adapting a pre-trained model to multiple target domains by plugging into the decoder an adapter module for each of them, including the source one. Each adapter improves the decoder performance on a specific domain, without the model forgetting about the images seen at training time. A gate network computes the weights to optimally blend the contributions from the adapters when the bitstream is decoded. We experimentally validate our method over two state-of-the-art pre-trained models, observing improved rate-distortion efficiency on the target domains without penalties on the source domain. Furthermore, the gate's ability to find similarities with the learned target domains enables better encoding efficiency also for images outside them.

[90]  arXiv:2404.15621 (cross-list from cs.ET) [pdf, ps, other]
Title: Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance
Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach.

[91]  arXiv:2404.15637 (cross-list from cs.SD) [pdf, other]
Title: HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder and optimises style text embeddings to align with the speaker style information through contrastive learning in parallel. Therefore, HybridVC can be efficiently trained under limited computational resources. Our experiments demonstrate HybridVC's superior training efficiency and its capability for advanced multi-modal voice style conversion. This underscores its potential for widespread applications such as user-defined personalised voice in various social media platforms. A comprehensive ablation study further validates the effectiveness of our method.

[92]  arXiv:2404.15643 (cross-list from cs.IT) [pdf, ps, other]
Title: Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge, we propose in this paper utilizing movable antenna (MA) arrays to enhance the satellite beam coverage and interference mitigation. Specifically, given the satellite orbit and the coverage requirement within a specific time interval, the antenna position vector (APV) and antenna weight vector (AWV) of the satellite-mounted MA array are jointly optimized over time to minimize the average signal leakage power to the interference area of the satellite, subject to the constraints of the minimum beamforming gain over the coverage area, the continuous movement of MAs, and the constant modulus of AWV. The corresponding continuous-time decision process for the APV and AWV is first transformed into a more tractable discrete-time optimization problem. Then, an alternating optimization (AO)-based algorithm is developed by iteratively optimizing the APV and AWV, where the successive convex approximation (SCA) technique is utilized to obtain locally optimal solutions during the iterations. Moreover, to further reduce the antenna movement overhead, a low-complexity MA scheme is proposed by using an optimized common APV over all time slots. Simulation results validate that the proposed MA array-aided beam coverage schemes can significantly decrease the interference leakage of the satellite compared to conventional FPA-based schemes, while the low-complexity MA scheme can achieve a performance comparable to the continuous-movement scheme.

[93]  arXiv:2404.15692 (cross-list from cs.LG) [pdf, other]
Title: Deep Learning for Accelerated and Robust MRI Reconstruction: a Review
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deep learning (DL) has recently emerged as a pivotal technology for enhancing magnetic resonance imaging (MRI), a critical tool in diagnostic radiology. This review paper provides a comprehensive overview of recent advances in DL for MRI reconstruction. It focuses on DL approaches and architectures designed to improve image quality, accelerate scans, and address data-related challenges. These include end-to-end neural networks, pre-trained networks, generative models, and self-supervised methods. The paper also discusses the role of DL in optimizing acquisition protocols, enhancing robustness against distribution shifts, and tackling subtle bias. Drawing on the extensive literature and practical insights, it outlines current successes, limitations, and future directions for leveraging DL in MRI reconstruction, while emphasizing the potential of DL to significantly impact clinical imaging practices.

[94]  arXiv:2404.15704 (cross-list from cs.LG) [pdf, other]
Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

[95]  arXiv:2404.15774 (cross-list from cs.CV) [pdf, other]
Title: Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation
Comments: 7 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.

[96]  arXiv:2404.15781 (cross-list from cs.CV) [pdf, other]
Title: Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat
Comments: Accepted by TGRS 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.

[97]  arXiv:2404.15854 (cross-list from cs.CR) [pdf, other]
Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Comments: Submitted to IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).

[98]  arXiv:2404.15939 (cross-list from cs.IR) [pdf, other]
Title: Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications
Comments: 6 pages, 5 Figure, 4 Tables, submitted to IEEE Globecom 2024 (see this https URL)
Subjects: Information Retrieval (cs.IR); Signal Processing (eess.SP)

The application of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems in the telecommunication domain presents unique challenges, primarily due to the complex nature of telecom standard documents and the rapid evolution of the field. The paper introduces and open-sources Telco-RAG, a customized RAG framework designed to handle the specific needs of telecommunications standards, particularly 3rd Generation Partnership Project (3GPP) documents. Telco-RAG addresses the critical challenges of implementing a RAG pipeline on highly technical content, paving the way for applying LLMs in telecommunications and offering guidelines for RAG implementation in other technical domains.

[99]  arXiv:2404.15946 (cross-list from cs.CV) [pdf, ps, other]
Title: Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for developing next-generation, image-text-based CAD schemes of breast cancer.

[100]  arXiv:2404.15968 (cross-list from cs.IT) [pdf, ps, other]
Title: Fast and Robust Expectation Propagation MIMO Detection via Preconditioned Conjugated Gradient
Comments: Submitted to IEEE
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We study the expectation propagation (EP) algorithm for symbol detection in massive multiple-input multiple-output (MIMO) systems. The EP detector shows excellent performance but suffers from a high computational complexity due to the matrix inversion, required in each EP iteration to perform marginal inference on a Gaussian system. We propose an inversion-free variant of the EP algorithm by treating inference on the mean and variance as two separate and simpler subtasks: We study the preconditioned conjugate gradient algorithm for obtaining the mean, which can significantly reduce the complexity and increase stability by relying on the Jacobi preconditioner that proves to fit the EP characteristics very well. For the variance, we use a simple approximation based on linear regression of the Gram channel matrix. Numerical studies on the Rayleigh-fading channel and on a realistic 3GPP channel model reveal the efficiency of the proposed scheme, which offers an attractive performance-complexity tradeoff and even outperforms the original EP detector in high multi-user inference cases where the matrix inversion becomes numerically unstable.

[101]  arXiv:2404.15992 (cross-list from cs.CV) [pdf, other]
Title: HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation, and target recognition. However, prevailing methods often face challenges in simultaneously capturing thermal region features and detailed information due to the disparate characteristics of infrared and visible images. Consequently, fusion outcomes frequently entail a compromise between thermal target area information and texture details. In this study, we introduce a novel heterogeneous dual-discriminator generative adversarial network (HDDGAN) to address this issue. Specifically, the generator is structured as a multi-scale skip-connected structure, facilitating the extraction of essential features from different source images. To enhance the information representation ability of the fusion result, an attention mechanism is employed to construct the information fusion layer within the generator, leveraging the disparities between the source images. Moreover, recognizing the distinct learning requirements of information in infrared and visible images, we design two discriminators with differing structures. This approach aims to guide the model to learn salient information from infrared images while simultaneously capturing detailed information from visible images. Extensive experiments conducted on various public datasets demonstrate the superiority of our proposed HDDGAN over other state-of-the-art (SOTA) algorithms, highlighting its enhanced potential for practical applications.

[102]  arXiv:2404.15999 (cross-list from cs.LG) [pdf, other]
Title: BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation
Comments: Accepted in IEEE 6th International Conference on Activity and Behavior Computing
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).

[103]  arXiv:2404.16005 (cross-list from cs.LG) [pdf, other]
Title: Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition
Authors: Hymalai Bello
Comments: Accepted in IEEE 22nd International Conference on Pervasive Computing and Communications (PerCom 2024)
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Combining different sensing modalities with multiple positions helps form a unified perception and understanding of complex situations such as human behavior. Hence, human activity recognition (HAR) benefits from combining redundant and complementary information (Unimodal/Multimodal). Even so, it is not an easy task. It requires a multidisciplinary approach, including expertise in sensor technologies, signal processing, data fusion algorithms, and domain-specific knowledge. This Ph.D. work employs sensing modalities such as inertial, pressure (audio and atmospheric pressure), and textile capacitive sensing for HAR. The scenarios explored are gesture and hand position tracking, facial and head pattern recognition, and body posture and gesture recognition. The selected wearable devices and sensing modalities are fully integrated with machine learning-based algorithms, some of which are implemented in the embedded device, on the edge, and tested in real-time.

[104]  arXiv:2404.16009 (cross-list from cs.IT) [pdf, other]
Title: How to Make Money From Fresh Data: Subscription Strategies in Age-Based Systems
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

We consider a communication system consisting of a server that tracks and publishes updates about a time-varying data source or event, and a gossip network of users interested in closely tracking the event. The timeliness of the information is measured through the version age of information. The users wish to have their expected version ages remain below a threshold, and have the option to either rely on gossip from their neighbors or subscribe to the server directly to follow updates about the event if the former option does not meet the timeliness requirements. The server wishes to maximize its profit by increasing the number of subscribers and reducing costs associated with the frequent sampling of the event. We model the problem setup as a Stackelberg game between the server and the users, where the server commits to a frequency of sampling the event, and the users make decisions on whether to subscribe or not. As an initial work, we focus on directed networks with unidirectional flow of information and obtain the optimal equilibrium strategies for all the players. We provide simulation results to confirm the theoretical findings and provide additional insights.

Replacements for Thu, 25 Apr 24

[105]  arXiv:2008.10796 (replaced) [pdf, other]
Title: Deep Variational Network Toward Blind Image Restoration
Comments: Accepted by TPAMI@2024. Code: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[106]  arXiv:2303.16659 (replaced) [pdf, other]
Title: Safe Zeroth-Order Optimization Using Quadratic Local Approximations
Comments: arXiv admin note: text overlap with arXiv:2211.02645
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[107]  arXiv:2304.02649 (replaced) [pdf, other]
Title: Specialty-Oriented Generalist Medical AI for Chest CT Screening
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[108]  arXiv:2305.14662 (replaced) [pdf, other]
Title: Probabilistic wind power forecasting resilient to missing values: an adaptive quantile regression approach
Authors: Honglin Wen
Comments: 26 pages, the revision to Energy
Subjects: Applications (stat.AP); Systems and Control (eess.SY)
[109]  arXiv:2306.10564 (replaced) [pdf, other]
Title: On stability and state-norm estimation of switched systems under restricted switching
Authors: Atreyee Kundu
Comments: 17 pages, 4 figures. Longer version of a manuscript under review. arXiv admin note: text overlap with arXiv:2207.07764
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[110]  arXiv:2308.04259 (replaced) [pdf, other]
Title: Generalized Forgetting Recursive Least Squares: Stability and Robustness Guarantees
Comments: Accepted to the IEEE Transactions on Automatic Control. Scheduled to appear in the 2024 November issue
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[111]  arXiv:2309.01774 (replaced) [pdf, other]
Title: Variational Tracking and Redetection for Closely-spaced Objects in Heavy Clutter: Supplementary Materials
Comments: Supplementary Materials, including Appendices C-F, begin on page 25, with pages 1-24 constituting the main article. Key updates from the first arXiv version include: added comparisons of the sum-product-algorithm-based tracker, included Appendix C for association update derivation, and made minor adjustments throughout the article to enhance presentation
Subjects: Signal Processing (eess.SP)
[112]  arXiv:2309.09819 (replaced) [pdf, ps, other]
Title: Projection-based Prediction-Correction Method for Distributed Consensus Optimization
Authors: Han Long
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[113]  arXiv:2311.08989 (replaced) [pdf, other]
Title: EMF-Aware Power Control for Massive MIMO: Cell-Free versus Cellular Networks
Comments: This work has been accepted for publication at 2024 IEEE WCNC. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[114]  arXiv:2311.18736 (replaced) [pdf, other]
Title: Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms
Comments: 25 pages, 16 figures
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Optimization and Control (math.OC)
[115]  arXiv:2312.02599 (replaced) [pdf, other]
Title: MAINS: A Magnetic Field Aided Inertial Navigation System for Indoor Positioning
Comments: fix a missing reference
Subjects: Robotics (cs.RO); Signal Processing (eess.SP)
[116]  arXiv:2312.03620 (replaced) [pdf, other]
Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[117]  arXiv:2312.08555 (replaced) [pdf, other]
Title: KDAS: Knowledge Distillation via Attention Supervision Framework for Polyp Segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[118]  arXiv:2402.09871 (replaced) [pdf, other]
Title: MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[119]  arXiv:2402.16558 (replaced) [pdf, other]
Title: Open Your Ears and Take a Look: A State-of-the-Art Report on the Integration of Sonification and Visualization
Comments: 30 pages, 9 figures, accepted for EuroVis 2024 conference
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120]  arXiv:2403.01150 (replaced) [pdf, other]
Title: Error Analysis of a Simple Quaternion Estimator: Proof of the Bias and Covariance Matrix
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[121]  arXiv:2403.09570 (replaced) [pdf, other]
Title: Multi-Fidelity Bayesian Optimization With Across-Task Transferable Max-Value Entropy Search
Comments: 12 pages, 8 figures, submitted to IEEE for peer review
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
[122]  arXiv:2403.15390 (replaced) [pdf, ps, other]
Title: Reconfigurable Intelligent Surface Constructing 6G Near-Field Networks
Authors: Yajun Zhao
Comments: 22 pages. The manuscript, originally composed in Chinese, has been submitted to a Chinese journal. Presented here is the translated version of that Chinese manuscript. By uploading this manuscript to your preprint platform, we aim to garner additional insights and references from experts in the field
Journal-ref: ZHAO Yajun. Reconfigurable Intelligent Surface Constructing 6G Near-Field Networks[J]. Mobile Communications, 2024,48(4): 2-11
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[123]  arXiv:2403.15959 (replaced) [pdf, other]
Title: Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction
Comments: Website with additional information, videos, and code: this https URL
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
[124]  arXiv:2403.20035 (replaced) [pdf, other]
Title: UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[125]  arXiv:2403.20168 (replaced) [pdf, other]
Title: Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation
Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[126]  arXiv:2404.03329 (replaced) [pdf, ps, other]
Title: DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data
Comments: Revised version for submission to IISE Transactions
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[127]  arXiv:2404.08136 (replaced) [pdf, other]
Title: Exponentially Weighted Moving Models
Subjects: Computation (stat.CO); Signal Processing (eess.SP); Optimization and Control (math.OC); Computational Finance (q-fin.CP); Machine Learning (stat.ML)
[128]  arXiv:2404.12133 (replaced) [pdf, ps, other]
Title: On Target Detection in the Presence of Clutter in Joint Communication and Sensing Cellular Networks
Journal-ref: 2023 16th International Conference on Signal Processing and Communication System (ICSPCS)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[129]  arXiv:2404.13277 (replaced) [pdf, other]
Title: Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives
Comments: Submitted to a conference
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[130]  arXiv:2404.13537 (replaced) [pdf, other]
Title: Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition
Comments: This paper is accepted by CVPR 2024 Workshop, code: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[131]  arXiv:2404.14700 (replaced) [pdf, other]
Title: FlashSpeech: Efficient Zero-Shot Speech Synthesis
Comments: Efficient zero-shot speech synthesis
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[132]  arXiv:2404.14836 (replaced) [src]
Title: Probabilistic forecasting of power system imbalance using neural network-based ensembles
Comments: One of the co-authors objected with having it on Arxiv already
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[133]  arXiv:2404.14956 (replaced) [pdf, other]
Title: DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions
Comments: 13 pages, 11 figures, 8 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[134]  arXiv:2404.15256 (replaced) [pdf, other]
Title: TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
[ total of 134 entries: 1-134 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2404, contact, help  (Access key information)