We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 79 entries: 1-79 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 19 Apr 24

[1]  arXiv:2404.11619 [pdf, ps, other]
Title: Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
Comments: 2 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relative to training on more widely available text data sets. We demonstrate that fine-tuning a general-purpose translation model to our Mandarin-English conversational telephone speech training set improves target-domain BLEU by more than 8 points, highlighting the importance of matched training data.

[2]  arXiv:2404.11621 [pdf, ps, other]
Title: Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression
Comments: accepted to ICASSP 2024; 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS)

In recent years, the introduction of neural networks (NNs) into the field of speech enhancement has brought significant improvements. However, many of the proposed methods are quite demanding in terms of computational complexity and memory footprint. For the application in dedicated communication devices, such as speakerphones, hands-free car systems, or smartphones, efficiency plays a major role along with performance. In this context, we present an efficient, high-performance hybrid joint acoustic echo control and noise suppression system, whereby our main contribution is the postfilter NN, performing both noise and residual echo suppression. The preservation of nearend speech is improved by a Bark-scale auditory filterbank for the NN postfilter. The proposed hybrid method is benchmarked with state-of-the-art methods and its effectiveness is demonstrated on the ICASSP 2023 AEC Challenge blind test set. We demonstrate that it offers high-quality nearend speech preservation during both double-talk and nearend speech conditions. At the same time, it is capable of efficient removal of echo leaks, achieving a comparable performance to already small state-of-the-art models such as the end-to-end DeepVQE-S, while requiring only around 10 % of its computational complexity. This makes it easily realtime implementable on a speakerphone device.

[3]  arXiv:2404.11707 [pdf, other]
Title: Perspectives on Contractivity in Control, Optimization, and Learning
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Contraction theory is a mathematical framework for studying the convergence, robustness, and modularity properties of dynamical systems and algorithms. In this opinion paper, we provide five main opinions on the virtues of contraction theory. These opinions are (i) contraction theory is a unifying framework emerging from classical and modern works, (ii) contractivity is computationally-friendly, robust, and modular stability, (iii) numerous dynamical systems are contracting, (iv) contraction theory is relevant to modern applications, and (v) contraction theory can be vastly extended in numerous directions. We survey recent theoretical and applied research in each of these five directions.

[4]  arXiv:2404.11725 [pdf, ps, other]
Title: Postoperative glioblastoma segmentation: Development of a fully automated pipeline using deep convolutional neural networks and comparison with currently available models
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Accurately assessing tumor removal is paramount in the management of glioblastoma. We developed a pipeline using MRI scans and neural networks to segment tumor subregions and the surgical cavity in postoperative images. Our model excels in accurately classifying the extent of resection, offering a valuable tool for clinicians in assessing treatment effectiveness.

[5]  arXiv:2404.11771 [pdf, ps, other]
Title: IoT-Driven Cloud-based Energy and Environment Monitoring System for Manufacturing Industry
Subjects: Systems and Control (eess.SY)

This research focused on the development of a cost-effective IoT solution for energy and environment monitoring geared towards manufacturing industries. The proposed system is developed using open-source software that can be easily deployed in any manufacturing environment. The system collects real-time temperature, humidity, and energy data from different devices running on different communication such as TCP/IP, Modbus, etc., and the data is transferred wirelessly using an MQTT client to a database working as a cloud storage solution. The collected data is then visualized and analyzed using a website running on a host machine working as a web client.

[6]  arXiv:2404.11836 [pdf, other]
Title: AI-Empowered RIS-Assisted Networks: CV-Enabled RIS Selection and DNN-Enabled Transmission
Subjects: Signal Processing (eess.SP)

This paper investigates artificial intelligence (AI) empowered schemes for reconfigurable intelligent surface (RIS) assisted networks from the perspective of fast implementation. We formulate a weighted sum-rate maximization problem for a multi-RIS-assisted network. To avoid huge channel estimation overhead due to activate all RISs, we propose a computer vision (CV) enabled RIS selection scheme based on a single shot multi-box detector. To realize real-time resource allocation, a deep neural network (DNN) enabled transmit design is developed to learn the optimal mapping from channel information to transmit beamformers and phase shift matrix. Numerical results illustrate that the CV module is able to select of RIS with the best propagation condition. The well-trained DNN achieves similar sum-rate performance to the existing alternative optimization method but with much smaller inference time.

[7]  arXiv:2404.11843 [pdf, other]
Title: Computer-Aided Diagnosis of Thoracic Diseases in Chest X-rays using hybrid CNN-Transformer Architecture
Authors: Sonit Singh
Comments: 24 pages, 13 Figures, 13 Tables. arXiv admin note: text overlap with arXiv:1904.09925 by other authors
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Medical imaging has been used for diagnosis of various conditions, making it one of the most powerful resources for effective patient care. Due to widespread availability, low cost, and low radiation, chest X-ray is one of the most sought after radiology examination for the diagnosis of various thoracic diseases. Due to advancements in medical imaging technologies and increasing patient load, current radiology workflow faces various challenges including increasing backlogs, working long hours, and increase in diagnostic errors. An automated computer-aided diagnosis system that can interpret chest X-rays to augment radiologists by providing actionable insights has potential to provide second opinion to radiologists, highlight relevant regions in the image, in turn expediting clinical workflow, reducing diagnostic errors, and improving patient care. In this study, we applied a novel architecture augmenting the DenseNet121 Convolutional Neural Network (CNN) with multi-head self-attention mechanism using transformer, namely SA-DenseNet121, that can identify multiple thoracic diseases in chest X-rays. We conducted experiments on four of the largest chest X-ray datasets, namely, ChestX-ray14, CheXpert, MIMIC-CXR-JPG, and IU-CXR. Experimental results in terms of area under the receiver operating characteristics (AUC-ROC) shows that augmenting CNN with self-attention has potential in diagnosing different thoracic diseases from chest X-rays. The proposed methodology has the potential to support the reading workflow, improve efficiency, and reduce diagnostic errors.

[8]  arXiv:2404.11858 [pdf, other]
Title: Graph Neural Networks for Wireless Networks: Graph Representation, Architecture and Evaluation
Subjects: Signal Processing (eess.SP)

Graph neural networks (GNNs) have been regarded as the basic model to facilitate deep learning (DL) to revolutionize resource allocation in wireless networks. GNN-based models are shown to be able to learn the structural information about graphs representing the wireless networks to adapt to the time-varying channel state information and dynamics of network topology. This article aims to provide a comprehensive overview of applying GNNs to optimize wireless networks via answering three fundamental questions, i.e., how to input the wireless network data into GNNs, how to improve the performance of GNNs, and how to evaluate GNNs. Particularly, two graph representations are given to transform wireless network parameters into graph-structured data. Then, we focus on the architecture design of the GNN-based models via introducing the basic message passing as well as model improvement methods including multi-head attention mechanism and residual structure. At last, we give task-oriented evaluation metrics for DL-enabled wireless resource allocation. We also highlight certain challenges and potential research directions for the application of GNNs in wireless networks.

[9]  arXiv:2404.11861 [pdf, other]
Title: sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model
Subjects: Signal Processing (eess.SP)

Surface electromyogram (sEMG), as a bioelectrical signal reflecting the activity of human muscles, has a wide range of applications in the control of prosthetics, human-computer interaction and so on. However, the existing recognition methods are all discrete actions, that is, every time an action is executed, it is necessary to restore the resting state before the next action, and it is unable to effectively recognize the gestures of continuous actions. To solve this problem, this paper proposes an improved fine gesture recognition model based on LightGBM algorithm. A sliding window sample segmentation scheme is adopted to replace active segment detection, and a series of innovative schemes such as improved loss function, Optuna hyperparameter search and Bagging integration are adopted to optimize LightGBM model and realize gesture recognition of continuous active segment signals. In order to verify the effectiveness of the proposed algorithm, we used the NinaproDB7 dataset to design the normal data recognition experiment and the disabled data transfer experiment. The results showed that the recognition rate of the proposed model was 89.72% higher than that of the optimal model Bi-ConvGRU for 18 gesture recognition tasks in the open data set, it reached 90.28%. Compared with the scheme directly trained on small sample data, the recognition rate of transfer learning was significantly improved from 60.35% to 78.54%, effectively solving the problem of insufficient data, and proving the applicability and advantages of transfer learning in fine gesture recognition tasks for disabled people.

[10]  arXiv:2404.11882 [pdf, other]
Title: Hybrid Navigation Acceptability and Safety
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Autonomous vessels have emerged as a prominent and accepted solution, particularly in the naval defence sector. However, achieving full autonomy for marine vessels demands the development of robust and reliable control and guidance systems that can handle various encounters with manned and unmanned vessels while operating effectively under diverse weather and sea conditions. A significant challenge in this pursuit is ensuring the autonomous vessels' compliance with the International Regulations for Preventing Collisions at Sea (COLREGs). These regulations present a formidable hurdle for the human-level understanding by autonomous systems as they were originally designed from common navigation practices created since the mid-19th century. Their ambiguous language assumes experienced sailors' interpretation and execution, and therefore demands a high-level (cognitive) understanding of language and agent intentions. These capabilities surpass the current state-of-the-art in intelligent systems. This position paper highlights the critical requirements for a trustworthy control and guidance system, exploring the complexity of adapting COLREGs for safe vessel-on-vessel encounters considering autonomous maritime technology competing and/or cooperating with manned vessels.

[11]  arXiv:2404.11889 [pdf, other]
Title: Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans
Comments: 13 pages, 10 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($\pi$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.

[12]  arXiv:2404.11900 [pdf, other]
Title: A New Hybrid Automaton Framework with Partial Differential Equation Dynamics
Comments: 17 pages
Subjects: Systems and Control (eess.SY)

This paper presents the syntax and semantics of a novel type of hybrid automaton (HA) with partial differential equation (PDE) dynamic, partial differential hybrid automata (PDHA). In PDHA, we add a spatial domain $X$ and harness a mathematic conception, partition, to help us formally define the spatial relations. While classically the dynamics of HA are described by ordinary differential equations (ODEs) and differential inclusions, PDHA is capable of describing the behavior of cyber-physical systems (CPS) with continuous dynamics that cannot be modelled using the canonical hybrid systems' framework. For the purposes of analyzing PDHA, we propose another model called the discrete space partial differential hybrid automata (DSPDHA) which handles discrete spatial domains using finite difference methods (FDM) and this simple and intuitive approach reduces the PDHA into HA with ODE systems. We conclude with two illustrative examples in order to exhibit the nature of PDHA and DSPDHA.

[13]  arXiv:2404.11929 [pdf, other]
Title: A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric regressor for predicting the DAT uptake amount from the nigral MRI patch. Acknowledging the symmetry between the right and left nigrae, the proposed regressor incorporates a paired input-output model that simultaneously predicts the DAT uptake amounts for both the right and left striata. Moreover, it employs a symmetric loss that imposes a constraint on the difference between right-to-left predictions, resembling the high correlation in DAT uptake amounts in the two lateral sides. Additionally, we propose a symmetric Monte-Carlo (MC) dropout method for providing a fruitful uncertainty estimate of the DAT uptake prediction, which utilizes the above symmetry. We evaluated the proposed approach on 734 nigral patches, which demonstrated significantly improved performance of the symmetric regressor compared with the standard regressors while giving better explainability and feature representation. The symmetric MC dropout also gave precise uncertainty ranges with a high probability of including the true DAT uptake amounts within the range.

[14]  arXiv:2404.11941 [pdf, other]
Title: Semantic Satellite Communications Based on Generative Foundation Model
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP); Image and Video Processing (eess.IV)

Satellite communications can provide massive connections and seamless coverage, but they also face several challenges, such as rain attenuation, long propagation delays, and co-channel interference. To improve transmission efficiency and address severe scenarios, semantic communication has become a popular choice, particularly when equipped with foundation models (FMs). In this study, we introduce an FM-based semantic satellite communication framework, termed FMSAT. This framework leverages FM-based segmentation and reconstruction to significantly reduce bandwidth requirements and accurately recover semantic features under high noise and interference. Considering the high speed of satellites, an adaptive encoder-decoder is proposed to protect important features and avoid frequent retransmissions. Meanwhile, a well-received image can provide a reference for repairing damaged images under sudden attenuation. Since acknowledgment feedback is subject to long propagation delays when retransmission is unavoidable, a novel error detection method is proposed to roughly detect semantic errors at the regenerative satellite. With the proposed detectors at both the satellite and the gateway, the quality of the received images can be ensured. The simulation results demonstrate that the proposed method can significantly reduce bandwidth requirements, adapt to complex satellite scenarios, and protect semantic information with an acceptable transmission delay.

[15]  arXiv:2404.11959 [pdf, other]
Title: Segmented Model-Based Hydrogen Delivery Control for PEM Fuel Cells: a Port-Hamiltonian Approach
Comments: 12 pages, 11 Figures
Subjects: Systems and Control (eess.SY)

This paper proposes an extended interconnection and damping assignment passivity-based control technique (IDA-PBC) to control the pressure dynamics in the fuel delivery subsystem (FDS) of proton exchange membrane fuel cells. The fuel cell stack is a distributed parameter model which can be modeled by partial differential equations PDEs). In this paper, the segmentation concept is used to approximate the PDEs model by ordinary differential equations (ODEs) model. Therefore, each segments are having multiple ODEs to obtain the lump-sum model of the segments. Subsequently, a generalized multi-input multi-output lumped parameters model is developed in port-Hamiltonian framework based on mass balance to minimize the modeling error. The modeling errors arises due to the difference between spatially distributed pressures in FDS segments, and also due to the difference between the actual stack pressure and the measured output pressure of the anode. The segments interconnection feasibilities are ensured by maintaining passivity of each segment. With consideration of re-circulation and bleeding of the anode in the modeling, an extended energy-shaping and output tracking IDA-PBC based state-feedback controller is proposed to control the spatially distributed pressure dynamics in the anode. Furthermore, a sliding mode observer of high order is designed to estimate the unmeasurable pressures in FDS with known disturbances. Performance recovery of output feedback control is accomplished with explicit stability analysis. The effectiveness of the proposed IDA-PBC approach is validated by the simulation results.

[16]  arXiv:2404.11974 [pdf, other]
Title: Device (In)Dependence of Deep Learning-based Image Age Approximation
Comments: This work was accepted and presented in: 2022 ICPR-Workshop on Artificial Intelligence for Multimedia Forensics and Disinformation Detection. Montreal, Quebec, Canada. However, due to a technical issue on the publishing companies' side, the work does not appear in the workshop proceedings
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The goal of temporal image forensic is to approximate the age of a digital image relative to images from the same device. Usually, this is based on traces left during the image acquisition pipeline. For example, several methods exist that exploit the presence of in-field sensor defects for this purpose. In addition to these 'classical' methods, there is also an approach in which a Convolutional Neural Network (CNN) is trained to approximate the image age. One advantage of a CNN is that it independently learns the age features used. This would make it possible to exploit other (different) age traces in addition to the known ones (i.e., in-field sensor defects). In a previous work, we have shown that the presence of strong in-field sensor defects is irrelevant for a CNN to predict the age class. Based on this observation, the question arises how device (in)dependent the learned features are. In this work, we empirically asses this by training a network on images from a single device and then apply the trained model to images from different devices. This evaluation is performed on 14 different devices, including 10 devices from the publicly available 'Northumbria Temporal Image Forensics' database. These 10 different devices are based on five different device pairs (i.e., with the identical camera model).

[17]  arXiv:2404.11995 [pdf, other]
Title: Cost and CO2 emissions co-optimisation of green hydrogen production in a grid-connected renewable energy system
Subjects: Systems and Control (eess.SY)

Green hydrogen is essential for producing renewable fuels that are needed in sectors that are hard to electrify directly. Hydrogen production in a grid-connected hybrid renewable energy plant necessitates smart planning to meet long-term hydrogen trading agreements while minimising costs and emissions. Previous research analysed economic and environmental impact of hydrogen production based on full foresight of renewable energy availabilty, electricity price, and CO2 intensity in the electricity grid. However, the full foresight assumption is impractical in day-to-day operation, often leading to underestimations of both the cost and CO2 emissions associated with hydrogen production. Therefore, this research introduces a novel long-term planner that uses historical data and short-term forecasts to plan hydrogen production in the day-to-day operation of a grid-connected hybrid renewable energy plant. The long-term planner co-minimises cost and CO2 emissions to determine the hydrogen production for the next day taking into account the remaining hydrogen production and the time remaining until the end of the delivery period, which can be a week, a month, or a year. Extended delivery periods provide operation flexibility, enabling cost and CO2 emissions reductions. Significant reductions in CO2 emissions can be achieved with relatively small increases in the levelised cost. Under day-to-day operation, the levelised cost of hydrogen is marginally higher than that of the full foresight; the CO2 emissions can be up to 60% higher. Despite a significant portion of the produced hydrogen not meeting the criteria for green hydrogen designation under current rules, CO2 emissions are lower than those from existing alternative hydrogen production methods. These results underscore the importance of balancing cost considerations with environmental impacts in operational decision-making.

[18]  arXiv:2404.12025 [pdf, other]
Title: PID Tuning using Cross-Entropy Deep Learning: a Lyapunov Stability Analysis
Journal-ref: IFAC-PapersOnLine, Volume 55, Issue 31, 2022
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Robotics (cs.RO)

Underwater Unmanned Vehicles (UUVs) have to constantly compensate for the external disturbing forces acting on their body. Adaptive Control theory is commonly used there to grant the control law some flexibility in its response to process variation. Today, learning-based (LB) adaptive methods are leading the field where model-based control structures are combined with deep model-free learning algorithms. This work proposes experiments and metrics to empirically study the stability of such a controller. We perform this stability analysis on a LB adaptive control system whose adaptive parameters are determined using a Cross-Entropy Deep Learning method.

[19]  arXiv:2404.12030 [pdf, other]
Title: Mapping back and forth between model predictive control and neural networks
Comments: 13 pages
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)

Model predictive control (MPC) for linear systems with quadratic costs and linear constraints is shown to admit an exact representation as an implicit neural network. A method to "unravel" the implicit neural network of MPC into an explicit one is also introduced. As well as building links between model-based and data-driven control, these results emphasize the capability of implicit neural networks for representing solutions of optimisation problems, as such problems are themselves implicitly defined functions.

[20]  arXiv:2404.12060 [pdf, other]
Title: Environment-aware UAV Communications: CKM Construction and Predictive Beamforming
Subjects: Signal Processing (eess.SP)

Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-assisted predictive beamforming approach based on the echoed joint communication and sensing signal, whereby the line-of-sight (LoS) link identification is performed via hypothesis testing using prior information provided by CKM. Depending on the identification result, extended Kalman filtering (EKF) is adopted to reliably track the target UAV. Furthermore, if the non-line-of-sight (NLoS) state is identified, the target UAV will be immediately connected to a candidate base station (BS), namely a handover will be triggered to alleviate the communication outage. The simulation results show that the proposed method can significantly enhance the UAV tracking and mmWave communication performance compared to the benchmarking schemes without using CKM or LoS identification.

[21]  arXiv:2404.12089 [pdf, other]
Title: An Overview of Electromagnetic Illusions: Empowering Smart Environments with Reconfigurable Metasurfaces
Subjects: Signal Processing (eess.SP); Applied Physics (physics.app-ph)

This study delves into the innovative landscape of metasurfaces, with a particular focus on their role in achieving EM illusion (EMI) a facet of paramount significance. The control of EM waves assumes a pivotal role in mitigating issues such as signal degradation, interference, and reduced communication range. Furthermore, the engineering of waves serves as a foundational element in achieving invisibility or minimized detectability. This survey unravels the theoretical underpinnings and practical designs of EMI coatings, which have been harnessed to develop functional metasurfaces. EMI, practically achieved through engineered coatings, confers a strategic advantage by either reducing the radar cross-section of objects or creating misleading footprints. In addition to illustrating the outstanding achievements in reconfigurable cloaking, this study culminates in the proposal of a novel approach, suggesting the emergence of EMI without the need for physically coating the device to conceal and thus proposing the concept of a smart EMI environment. This groundbreaking work opens a new way for engineers and researchers to unlock exotic and versatile designs that build on reconfigurable intelligent surfaces (RIS). Crucially the designs enabled by the proposed approach, present a wide array of applications, encompassing camouflaging, deceptive sensing, radar cognition control, and defence security, among others. In essence, this research stands as a beacon guiding the exploration of uncharted territories in wave control through smart EMI environments, with profound implications spanning basic academic research in RIS through advanced security technologies and communication systems.

[22]  arXiv:2404.12097 [pdf, other]
Title: MPC of Uncertain Nonlinear Systems with Meta-Learning for Fast Adaptation of Neural Predictive Models
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In this paper, we consider the problem of reference tracking in uncertain nonlinear systems. A neural State-Space Model (NSSM) is used to approximate the nonlinear system, where a deep encoder network learns the nonlinearity from data, and a state-space component captures the temporal relationship. This transforms the nonlinear system into a linear system in a latent space, enabling the application of model predictive control (MPC) to determine effective control actions. Our objective is to design the optimal controller using limited data from the \textit{target system} (the system of interest). To this end, we employ an implicit model-agnostic meta-learning (iMAML) framework that leverages information from \textit{source systems} (systems that share similarities with the target system) to expedite training in the target system and enhance its control performance. The framework consists of two phases: the (offine) meta-training phase learns a aggregated NSSM using data from source systems, and the (online) meta-inference phase quickly adapts this aggregated model to the target system using only a few data points and few online training iterations, based on local loss function gradients. The iMAML algorithm exploits the implicit function theorem to exactly compute the gradient during training, without relying on the entire optimization path. By focusing solely on the optimal solution, rather than the path, we can meta-train with less storage complexity and fewer approximations than other contemporary meta-learning algorithms. We demonstrate through numerical examples that our proposed method can yield accurate predictive models by adaptation, resulting in a downstream MPC that outperforms several baselines.

[23]  arXiv:2404.12148 [pdf, other]
Title: Unknown Interference Modeling for Rate Adaptation in Cell-Free Massive MIMO Networks
Comments: 6 pages, Accepted at IEEE Wireless Communications and Networking Conference (WCNC), 2024. arXiv admin note: text overlap with arXiv:2305.07344
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Co-channel interference poses a challenge in any wireless communication network where the time-frequency resources are reused over different geographical areas. The interference is particularly diverse in cell-free massive multiple-input multiple-output (MIMO) networks, where a large number of user equipments (UEs) are multiplexed by a multitude of access points (APs) on the same time-frequency resources. For realistic and scalable network operation, only the interference from UEs belonging to the same serving cluster of APs can be estimated in real-time and suppressed by precoding/combining. As a result, the unknown interference arising from scheduling variations in neighboring clusters makes the rate adaptation hard and can lead to outages. This paper aims to model the unknown interference power in the uplink of a cell-free massive MIMO network. The results show that the proposed method effectively describes the distribution of the unknown interference power and provides a tool for rate adaptation with guaranteed target outage.

[24]  arXiv:2404.12163 [pdf, other]
Title: Unsupervised Microscopy Video Denoising
Comments: Accepted at CVPRW 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce a novel unsupervised network to denoise microscopy videos featured by image sequences captured by a fixed location microscopy camera. Specifically, we propose a DeepTemporal Interpolation method, leveraging a temporal signal filter integrated into the bottom CNN layers, to restore microscopy videos corrupted by unknown noise types. Our unsupervised denoising architecture is distinguished by its ability to adapt to multiple noise conditions without the need for pre-existing noise distribution knowledge, addressing a significant challenge in real-world medical applications. Furthermore, we evaluate our denoising framework using both real microscopy recordings and simulated data, validating our outperforming video denoising performance across a broad spectrum of noise scenarios. Extensive experiments demonstrate that our unsupervised model consistently outperforms state-of-the-art supervised and unsupervised video denoising techniques, proving especially effective for microscopy videos.

[25]  arXiv:2404.12165 [pdf, other]
Title: Stability Certificates for Receding Horizon Games
Subjects: Systems and Control (eess.SY)

Game-theoretic MPC (or Receding Horizon Games) is an emerging control methodology for multi-agent systems that generates control actions by solving a dynamic game with coupling constraints in a receding-horizon fashion. This control paradigm has recently received an increasing attention in various application fields, including robotics, autonomous driving, traffic networks, and energy grids, due to its ability to model the competitive nature of self-interested agents with shared resources while incorporating future predictions, dynamic models, and constraints into the decision-making process. In this work, we present the first formal stability analysis based on dissipativity and monotone operator theory that is valid also for non-potential games. Specifically, we derive LMI-based certificates that ensure asymptotic stability and are numerically verifiable. Moreover, we show that, if the agents have decoupled dynamics, the numerical verification can be performed in a scalable manner. Finally, we present tuning guidelines for the agents' cost function weights to fulfill the certificates and, thus, ensure stability.

[26]  arXiv:2404.12170 [pdf, other]
Title: Secure Semantic Communication for Image Transmission in the Presence of Eavesdroppers
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdropping, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication (SemCom) approach for image transmission, which integrates steganography technology to conceal private information within non-private images (host images). Specifically, we propose an invertible neural network (INN)-based signal steganography approach, which embeds channel input signals of a private image into those of a host image before transmission. This ensures that the original private image can be reconstructed from the received signals at the legitimate receiver, while the eavesdropper can only decode the information of the host image. Simulation results demonstrate that the proposed approach maintains comparable reconstruction quality of both host and private images at the legitimate receiver, compared to scenarios without any secure mechanisms. Experiments also show that the eavesdropper is only able to reconstruct host images, showcasing the enhanced security provided by our approach.

[27]  arXiv:2404.12187 [pdf, other]
Title: Stability-informed Bayesian Optimization for MPC Cost Function Learning
Comments: 7 pages, 3 figures, accepted for NMPC 2024
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Designing predictive controllers towards optimal closed-loop performance while maintaining safety and stability is challenging. This work explores closed-loop learning for predictive control parameters under imperfect information while considering closed-loop stability. We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network, optimizing closed-loop behavior as well as minimizing model-plant mismatch. Doing so offers a high degree of freedom and, thus, the opportunity for efficient and global optimization towards the desired and optimal closed-loop behavior. We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate. The effectiveness of the proposed approach is underlined in simulations, highlighting its performance and safety capabilities.

[28]  arXiv:2404.12296 [pdf, other]
Title: Long Duration Battery Sizing, Siting, and Operation Under Wildfire Risk Using Progressive Hedging
Subjects: Systems and Control (eess.SY)

Battery sizing and siting problems are computationally challenging due to the need to make long-term planning decisions that are cognizant of short-term operational decisions. This paper considers sizing, siting, and operating batteries in a power grid to maximize their benefits, including price arbitrage and load shed mitigation, during both normal operations and periods with high wildfire ignition risk. We formulate a multi-scenario optimization problem for long duration battery storage while considering the possibility of load shedding during Public Safety Power Shutoff (PSPS) events that de-energize lines to mitigate severe wildfire ignition risk. To enable a computationally scalable solution of this problem with many scenarios of wildfire risk and power injection variability, we develop a customized temporal decomposition method based on a progressive hedging framework. Extending traditional progressive hedging techniques, we consider coupling in both placement variables across all scenarios and state-of-charge variables at temporal boundaries. This enforces consistency across scenarios while enabling parallel computations despite both spatial and temporal coupling. The proposed decomposition facilitates efficient and scalable modeling of a full year of hourly operational decisions to inform the sizing and siting of batteries. With this decomposition, we model a year of hourly operational decisions to inform optimal battery placement for a 240-bus WECC model in under 70 minutes of wall-clock time.

[29]  arXiv:2404.12329 [pdf, other]
Title: Practical Considerations for Discrete-Time Implementations of Continuous-Time Control Barrier Function-Based Safety Filters
Comments: 7 pages, 4 figures, accepted for publication at the IEEE American Control Conference, 2024
Subjects: Systems and Control (eess.SY)

Safety filters based on control barrier functions (CBFs) have become a popular method to guarantee safety for uncertified control policies, e.g., as resulting from reinforcement learning. Here, safety is defined as staying in a pre-defined set, the safe set, that adheres to the system's state constraints, e.g., as given by lane boundaries for a self-driving vehicle. In this paper, we examine one commonly overlooked problem that arises in practical implementations of continuous-time CBF-based safety filters. In particular, we look at the issues caused by discrete-time implementations of the continuous-time CBF-based safety filter, especially for cases where the magnitude of the Lie derivative of the CBF with respect to the control input is zero or close to zero. When overlooked, this filter can result in undesirable chattering effects or constraint violations. In this work, we propose three mitigation strategies that allow us to use a continuous-time safety filter in a discrete-time implementation with a local relative degree. Using these strategies in augmented CBF-based safety filters, we achieve safety for all states in the safe set by either using an additional penalty term in the safety filtering objective or modifying the CBF such that those undesired states are not encountered during closed-loop operation. We demonstrate the presented issue and validate our three proposed mitigation strategies in simulation and on a real-world quadrotor.

Cross-lists for Fri, 19 Apr 24

[30]  arXiv:2404.11520 (cross-list from math.OC) [pdf, other]
Title: Equitably allocating wildfire resilience investments for power grids: The curse of aggregation and vulnerability indices
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

Wildfires ignited by power systems infrastructure are among the most destructive wildfires; hence some utility companies in wildfire-prone regions have pursued a proactive policy of emergency power shutoffs. These shutoffs, while mitigating the risk of disastrous ignition events, result in power outages that could negatively impacts vulnerable communities. In this paper, we consider how to equitably allocate funds to underground and effectively de-risk power lines in transmission networks. We explore the impact of the 2021 White House resource allocation policy called the Justice40 initiative, which states that 40% of the benefits of federally-funded climate-related investments should go to socially vulnerable communities. The definition of what constitutes a vulnerable community varies by organization, and we consider two major recently proposed vulnerability indices: the Justice40 index created under the 2021 White House and the Social Vulnerability Index (SVI) developed by the Center for Disease Control and Prevention (CDC). We show that allocating budget according to these two indices fails to reduce power outages for indigenous communities and those subject to high wildfire ignition risk using a high-fidelity synthetic power grid dataset that matches the key features of the Texas transmission system. We discuss how aggregation of communities and "one size fits all" vulnerability indices might be the reasons for the misalignment between the goals of vulnerability indices and their realized impact in this particular case study. We provide a method of achieving an equitable investment plan by adding group-level protections on percentage of load that is shed across each population group of interest.

[31]  arXiv:2404.11790 (cross-list from math.OC) [pdf, ps, other]
Title: Constrained Stochastic Recursive Momentum Successive Convex Approximation
Comments: 32 pages, 4 figures, journal submission
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP)

We consider stochastic optimization problems with functional constraints. If the objective and constraint functions are not convex, the classical stochastic approximation algorithms such as the proximal stochastic gradient descent do not lead to efficient algorithms. In this work, we put forth an accelerated SCA algorithm that utilizes the recursive momentum-based acceleration which is widely used in the unconstrained setting. Remarkably, the proposed algorithm also achieves the optimal SFO complexity, at par with that achieved by state-of-the-art (unconstrained) stochastic optimization algorithms and match the SFO-complexity lower bound for minimization of general smooth functions. At each iteration, the proposed algorithm entails constructing convex surrogates of the objective and the constraint functions, and solving the resulting convex optimization problem. A recursive update rule is employed to track the gradient of the objective function, and contributes to achieving faster convergence and improved SFO complexity. A key ingredient of the proof is a new parameterized version of the standard Mangasarian-Fromowitz Constraints Qualification, that allows us to bound the dual variables and hence establish that the iterates approach an $\epsilon$-stationary point. We also detail a obstacle-avoiding trajectory optimization problem that can be solved using the proposed algorithm, and show that its performance is superior to that of the existing algorithms. The performance of the proposed algorithm is also compared against that of a specialized sparse classification algorithm on a binary classification problem.

[32]  arXiv:2404.11807 (cross-list from cs.RO) [pdf, other]
Title: Continuous Dynamic Bipedal Jumping via Adaptive-model Optimization
Comments: 8 pages, 9 figures, submitted to IEEE RA-L for review and possible publication
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Dynamic and continuous jumping remains an open yet challenging problem in bipedal robot control. The choice of dynamic models in trajectory optimization (TO) problems plays a huge role in trajectory accuracy and computation efficiency, which normally cannot be ensured simultaneously. In this letter, we propose a novel adaptive-model optimization approach, a unified framework of Adaptive-model TO and Adaptive-frequency Model Predictive Control (MPC), to effectively realize continuous and robust jumping on HECTOR bipedal robot. The proposed Adaptive-model TO fuses adaptive-fidelity dynamics modeling of bipedal jumping motion for model fidelity necessities in different jumping phases to ensure trajectory accuracy and computation efficiency. In addition, conventional approaches have unsynchronized sampling frequencies in TO and real-time control, causing the framework to have mismatched modeling resolutions. We adapt MPC sampling frequency based on TO trajectory resolution in different phases for effective trajectory tracking. In hardware experiments, we have demonstrated robust and dynamic jumps covering a distance of up to 40 cm (57% of robot height). To verify the repeatability of this experiment, we run 53 jumping experiments and achieve 90% success rate. In continuous jumps, we demonstrate continuous bipedal jumping with terrain height perturbations (up to 5 cm) and discontinuities (up to 20 cm gap).

[33]  arXiv:2404.11881 (cross-list from cs.IT) [pdf, other]
Title: Joint Transmitter and Receiver Design for Movable Antenna Enhanced Multicast Communications
Comments: 13 double-column single-spaced pages, 9 figures, submitted to IEEE journal for possible publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Movable antenna (MA) is an emerging technology that utilizes localized antenna movement to pursue better channel conditions for enhancing communication performance. In this paper, we study the MA-enhanced multicast transmission from a base station equipped with multiple MAs to multiple groups of single-MA users. Our goal is to maximize the minimum weighted signal-to-interference-plus-noise ratio (SINR) among all the users by jointly optimizing the position of each transmit/receive MA and the transmit beamforming. To tackle this challenging problem, we first consider the single-group scenario and propose an efficient algorithm based on the techniques of alternating optimization and successive convex approximation. Particularly, when optimizing transmit or receive MA positions, we construct a concave lower bound for the signal-to-noise ratio (SNR) of each user by applying only the second-order Taylor expansion, which is more effective than existing works utilizing two-step approximations. The proposed design is then extended to the general multi-group scenario. Simulation results demonstrate that significant performance gains in terms of achievable max-min SNR/SINR can be obtained by our proposed algorithm over benchmark schemes. Additionally, the proposed algorithm can notably reduce the required amount of transmit power or antennas for achieving a target level of max-min SNR/SINR performance compared to benchmark schemes.

[34]  arXiv:2404.11938 (cross-list from cs.MM) [pdf, other]
Title: HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis
Comments: 13 pages, IJCAI-2024
Subjects: Multimedia (cs.MM); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i.e., audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy.

[35]  arXiv:2404.11976 (cross-list from cs.SD) [pdf, other]
Title: Large Language Models: From Notes to Musical Form
Authors: Lilac Atassi
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

While many topics of the learning-based approach to automated music generation are under active research, musical form is under-researched. In particular, recent methods based on deep learning models generate music that, at the largest time scale, lacks any structure. In practice, music longer than one minute generated by such models is either unpleasantly repetitive or directionless. Adapting a recent music generation model, this paper proposes a novel method to generate music with form. The experimental results show that the proposed method can generate 2.5-minute-long music that is considered as pleasant as the music used to train the model. The paper first reviews a recent music generation method based on language models (transformer architecture). We discuss why learning musical form by such models is infeasible. Then we discuss our proposed method and the experiments.

[36]  arXiv:2404.12018 (cross-list from cs.RO) [pdf, other]
Title: Automated Real-Time Inspection in Indoor and Outdoor 3D Environments with Cooperative Aerial Robots
Comments: 2024 International Conference on Unmanned Aircraft Systems (ICUAS)
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work introduces a cooperative inspection system designed to efficiently control and coordinate a team of distributed heterogeneous UAV agents for the inspection of 3D structures in cluttered, unknown spaces. Our proposed approach employs a two-stage innovative methodology. Initially, it leverages the complementary sensing capabilities of the robots to cooperatively map the unknown environment. It then generates optimized, collision-free inspection paths, thereby ensuring comprehensive coverage of the structure's surface area. The effectiveness of our system is demonstrated through qualitative and quantitative results from extensive Gazebo-based simulations that closely replicate real-world inspection scenarios, highlighting its ability to thoroughly inspect real-world-like 3D structures.

[37]  arXiv:2404.12062 (cross-list from cs.SD) [pdf, other]
Title: MIDGET: Music Conditioned 3D Dance Generation
Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence
Journal-ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)

In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the Motion VQ-VAE model to store different human pose codes, 2) employing Motion GPT model to generate pose codes with music and motion Encoders, 3) a simple framework for music feature extraction. We compare with existing state-of-the-art models and perform ablation experiments on AIST++, the largest publicly available music-dance dataset. Experiments demonstrate that our proposed framework achieves state-of-the-art performance on motion quality and its alignment with the music.

[38]  arXiv:2404.12071 (cross-list from cs.IT) [pdf, ps, other]
Title: Complexity-Aware Theoretical Performance Analysis of SDM MIMO Equalizers
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a theoretical framework to compute, rapidly and accurately, the signal-to-noise ratio at the output of spatial-division multiplexing (SDM) linear MIMO equalizers with arbitrary numbers of spatial modes and filter taps and demonstrate three orders of magnitude of speed-up compared to Monte Carlo simulations.

[39]  arXiv:2404.12077 (cross-list from cs.SD) [pdf, other]
Title: TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
Authors: Rong Wang, Kun Sun
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models. The motivation for this research is twofold: firstly, to empirically assess the advantages and drawbacks of multi-task learning over single-task models in the context of speaker profiling; secondly, to emphasize the undiminished significance of skillful feature engineering for speaker recognition tasks. The findings reveal challenges in accent classification, and multi-task learning is found advantageous for tasks of similar complexity. Non-sequential features are favored for speaker recognition, but sequential ones can serve as starting points for complex models. The study underscores the necessity of meticulous experimentation and parameter tuning for deep learning models.

[40]  arXiv:2404.12132 (cross-list from cs.SD) [pdf, other]
Title: Enhancing Suicide Risk Assessment: A Speech-Based Automated Approach in Emergency Medicine
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

The delayed access to specialized psychiatric assessments and care for patients at risk of suicidal tendencies in emergency departments creates a notable gap in timely intervention, hindering the provision of adequate mental health support during critical situations. To address this, we present a non-invasive, speech-based approach for automatic suicide risk assessment. For our study, we have collected a novel dataset of speech recordings from $20$ patients from which we extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations. We proceed by conducting a binary classification to assess suicide risk in a leave-one-subject-out fashion. Our most effective speech model achieves a balanced accuracy of $66.2\,\%$. Moreover, we show that integrating our speech model with a series of patients' metadata, such as the history of suicide attempts or access to firearms, improves the overall result. The metadata integration yields a balanced accuracy of $94.4\,\%$, marking an absolute improvement of $28.2\,\%$, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.

[41]  arXiv:2404.12133 (cross-list from cs.IT) [pdf, ps, other]
Title: On Target Detection in the Presence of Clutter in Joint Communication and Sensing Cellular Networks
Journal-ref: 2023 16th International Conference on Signal Processing and Communication System (ICSPCS)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Recent works on joint communication and sensing (JCAS) cellular networks have proposed to use time division mode (TDM) and concurrent mode (CM), as alternative methods for sharing the resources between communication and sensing signals. While the performance of these JCAS schemes for object tracking and parameter estimation has been studied in previous works, their performance on target detection in the presence of clutter has not been analyzed. In this paper, we propose a detection scheme for estimating the number of targets in JCAS cellular networks that employ TDM or CM resource sharing. The proposed detection method allows for the presence of clutter and/or temporally correlated noise. This scheme is studied with respect to the JCAS trade-off parameters that allow to control the time slots in TDM and the power resources in CM allocated to sensing and communications. The performance of two fundamental transmit beamforming schemes, typical for JCAS, is compared in terms of the receiver operating characteristics curves. Our results indicate that in general the TDM scheme gives a somewhat better detection performance compared to the CM scheme, although both schemes outperform existing approaches provided that their respective trade-off parameters are tuned properly.

[42]  arXiv:2404.12134 (cross-list from cs.AI) [pdf, other]
Title: Warped Time Series Anomaly Detection
Subjects: Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

This paper addresses the problem of detecting time series outliers, focusing on systems with repetitive behavior, such as industrial robots operating on production lines.Notable challenges arise from the fact that a task performed multiple times may exhibit different duration in each repetition and that the time series reported by the sensors are irregularly sampled because of data gaps. The anomaly detection approach presented in this paper consists of three stages.The first stage identifies the repetitive cycles in the lengthy time series and segments them into individual time series corresponding to one task cycle, while accounting for possible temporal distortions.The second stage computes a prototype for the cycles using a GPU-based barycenter algorithm, specifically tailored for very large time series.The third stage uses the prototype to detect abnormal cycles by computing an anomaly score for each cycle.The overall approach, named WarpEd Time Series ANomaly Detection (WETSAND), makes use of the Dynamic Time Warping algorithm and its variants because they are suited to the distorted nature of the time series.The experiments show that \wetsand scales to large signals, computes human-friendly prototypes, works with very little data, and outperforms some general purpose anomaly detection approaches such as autoencoders.

[43]  arXiv:2404.12142 (cross-list from cs.CV) [pdf, other]
Title: SDIP: Self-Reinforcement Deep Image Prior Framework for Image Processing
Authors: Ziyu Shu, Zhixin Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deep image prior (DIP) proposed in recent research has revealed the inherent trait of convolutional neural networks (CNN) for capturing substantial low-level image statistics priors. This framework efficiently addresses the inverse problems in image processing and has induced extensive applications in various domains. However, as the whole algorithm is initialized randomly, the DIP algorithm often lacks stability. Thus, this method still has space for further improvement. In this paper, we propose the self-reinforcement deep image prior (SDIP) as an improved version of the original DIP. We observed that the changes in the DIP networks' input and output are highly correlated during each iteration. SDIP efficiently utilizes this trait in a reinforcement learning manner, where the current iteration's output is utilized by a steering algorithm to update the network input for the next iteration, guiding the algorithm toward improved results. Experimental results across multiple applications demonstrate that our proposed SDIP framework offers improvement compared to the original DIP method and other state-of-the-art methods.

[44]  arXiv:2404.12178 (cross-list from physics.soc-ph) [pdf, other]
Title: Designing a sector-coupled European energy system robust to 60 years of historical weather data
Subjects: Physics and Society (physics.soc-ph); Systems and Control (eess.SY)

As energy systems transform to rely on renewable energy and electrification, they encounter stronger year-to-year variability in energy supply and demand. However, most infrastructure planning is based on a single weather year, resulting in a lack of robustness. In this paper, we optimize energy infrastructure for a European energy system designed for net-zero CO$_2$ emissions in 62 different weather years. Subsequently, we fix the capacity layouts and simulate their operation in every weather year, to evaluate resource adequacy and CO$_2$ emissions abatement. We show that interannual weather variability causes variation of $\pm$10\% in total system cost. The most expensive capacity layout obtains the lowest net CO$_2$ emissions but not the highest resource adequacy. Instead, capacity layouts designed with years including compound weather events result in a more robust and cost-effective design. Deploying CO$_2$-emitting backup generation is a cost-effective robustness measure, which only increase CO$_2$ emissions marginally as the average CO$_2$ emissions remain less than 1\% of 1990 levels. Our findings highlight how extreme weather years drive investments in robustness measures, making them compatible with all weather conditions within six decades of historical weather data.

[45]  arXiv:2404.12251 (cross-list from cs.LG) [pdf, other]
Title: Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities
Comments: 15 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial expressions (image), are crucial in understanding human emotions. However, AI's journey in multimodal emotion recognition (MER) is marked by substantial technical challenges. One significant hurdle is how AI models manage the absence of a particular modality - a frequent occurrence in real-world situations. This study's central focus is assessing the performance and resilience of two strategies when confronted with the lack of one modality: a novel multimodal dynamic modality and view selection and a cross-attention mechanism. Results on the RECOLA dataset show that dynamic selection-based methods are a promising approach for MER. In the missing modalities scenarios, all dynamic selection-based methods outperformed the baseline. The study concludes by emphasizing the intricate interplay between audio and video modalities in emotion prediction, showcasing the adaptability of dynamic selection methods in handling missing modalities.

[46]  arXiv:2404.12257 (cross-list from cs.CV) [pdf, other]
Title: Food Portion Estimation via 3D Object Scaling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene. Our method estimates the pose of the camera and the food object in the input image and recreates the eating occasion by rendering an image of a 3D model of the food with the estimated poses. We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items and associated annotations including food volume, weight, and energy. Our method achieves an average error of 31.10 kCal (17.67%) on this dataset, outperforming existing portion estimation methods.

[47]  arXiv:2404.12299 (cross-list from cs.CL) [pdf, other]
Title: Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair
Comments: 23 pages, 9 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems. However, it is very challenging to curate such a corpus due to limitations in the abilities of annotators, and hence, existing SI corpora are limited. Therefore, we propose a method to convert existing speech translation corpora into interpretation-style data, maintaining the original word order and preserving the entire source content using Large Language Models (LLM-SI-Corpus). We demonstrate that fine-tuning SiMT models in text-to-text and speech-to-text settings with the LLM-SI-Corpus reduces latencies while maintaining the same level of quality as the models trained with offline datasets. The LLM-SI-Corpus is available at \url{https://github.com/yusuke1997/LLM-SI-Corpus}.

[48]  arXiv:2404.12308 (cross-list from cs.RO) [pdf, other]
Title: ASID: Active Exploration for System Identification in Robotic Manipulation
Comments: Project website at this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accurate simulators can circumvent these challenges and use a large amount of cheap simulation data to learn controllers that can effectively transfer to the real world. The challenge with such model-based techniques is the requirement for an extremely accurate simulation, requiring both the specification of appropriate simulation assets and physical parameters. This requires considerable human effort to design for every environment being considered. In this work, we propose a learning system that can leverage a small amount of real-world data to autonomously refine a simulation model and then plan an accurate control strategy that can be deployed in the real world. Our approach critically relies on utilizing an initial (possibly inaccurate) simulator to design effective exploration policies that, when deployed in the real world, collect high-quality data. We demonstrate the efficacy of this paradigm in identifying articulation, mass, and other physical parameters in several challenging robotic manipulation tasks, and illustrate that only a small amount of real-world data can allow for effective sim-to-real transfer. Project website at https://weirdlabuw.github.io/asid

Replacements for Fri, 19 Apr 24

[49]  arXiv:2302.04143 (replaced) [pdf, other]
Title: Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models
Comments: Medical Imaging with Deep Learning 2022 accepted short paper Jun 2022
Journal-ref: Medical Imaging with Deep Learning 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[50]  arXiv:2304.04974 (replaced) [pdf, other]
Title: Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Comments: 12 pages, 7 figures, IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[51]  arXiv:2306.11371 (replaced) [pdf, other]
Title: Visually grounded few-shot word learning in low-resource settings
Comments: Accepted to TASLP. arXiv admin note: substantial text overlap with arXiv:2305.15937
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[52]  arXiv:2307.10757 (replaced) [pdf, other]
Title: Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Comments: This paper was accepted by IEEE Transactions on Affective Computing 2024
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[53]  arXiv:2308.06981 (replaced) [pdf, other]
Title: The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Comments: Accepted for Transactions of the International Society for Music Information Retrieval
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[54]  arXiv:2308.10913 (replaced) [pdf, other]
Title: Automated mapping of virtual environments with visual predictive coding
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[55]  arXiv:2309.11744 (replaced) [pdf, ps, other]
Title: Infinite Horizon Average Cost Optimality Criteria for Mean-Field Control
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[56]  arXiv:2310.02141 (replaced) [pdf, other]
Title: Adaptive Gait Modeling and Optimization for Principally Kinematic Systems
Comments: 7 pages, 4 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[57]  arXiv:2311.06634 (replaced) [pdf, ps, other]
Title: Back to Basics: Fast Denoising Iterative Algorithm
Authors: Deborah Pereg
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[58]  arXiv:2311.09590 (replaced) [pdf, other]
Title: MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images
Comments: under consideration of Computer Vision and Image Understanding journal
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[59]  arXiv:2311.14422 (replaced) [pdf, ps, other]
Title: Investigation on the Impact of Heat Waves on Distribution System Failures
Subjects: Systems and Control (eess.SY)
[60]  arXiv:2312.02734 (replaced) [pdf, ps, other]
Title: Geometric Data-Driven Dimensionality Reduction in MPC with Guarantees
Comments: This paper is presented at ECC 2024
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[61]  arXiv:2312.05916 (replaced) [pdf, ps, other]
Title: Switching Frequency Limitation with Finite Control Set Model Predictive Control via Slack Variables
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[62]  arXiv:2401.04239 (replaced) [pdf, ps, other]
Title: The Required Spatial Resolution to Assess Imbalance using Plantar Pressure Mapping
Subjects: Signal Processing (eess.SP); Instrumentation and Detectors (physics.ins-det); Medical Physics (physics.med-ph); Quantitative Methods (q-bio.QM)
[63]  arXiv:2401.15663 (replaced) [pdf, other]
Title: Low-resolution Prior Equilibrium Network for CT Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[64]  arXiv:2402.07673 (replaced) [pdf, ps, other]
Title: A Computational Model of the Electrically or Acoustically Evoked Compound Action Potential in Cochlear Implant Users with Residual Hearing
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Medical Physics (physics.med-ph); Audio and Speech Processing (eess.AS)
[65]  arXiv:2402.17300 (replaced) [pdf, other]
Title: VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Comments: Accepted by CVPR 2024. The camera-ready version will soon be available
Subjects: Image and Video Processing (eess.IV)
[66]  arXiv:2402.17402 (replaced) [pdf, other]
Title: Beacon, a lightweight deep reinforcement learning benchmark library for flow control
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Systems and Control (eess.SY)
[67]  arXiv:2403.01194 (replaced) [pdf, other]
Title: A Comparative Study of Rapidly-exploring Random Tree Algorithms Applied to Ship Trajectory Planning and Behavior Generation
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[68]  arXiv:2403.01255 (replaced) [pdf, other]
Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Journal-ref: Information Fusion, Elsevier, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69]  arXiv:2403.04062 (replaced) [pdf, other]
Title: Chance-Constrained Control for Safe Spacecraft Autonomy: Convex Programming Approach
Authors: Kenshiro Oguri
Comments: Accepted for 2024 IEEE American Control Conference
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)
[70]  arXiv:2403.04923 (replaced) [pdf, other]
Title: Control-based Graph Embeddings with Data Augmentation for Contrastive Learning
Comments: Accepted in 2024 American Control Conference (ACC), July 8-12, 2024 in Toronto, ON, Canada
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[71]  arXiv:2404.03914 (replaced) [pdf, other]
Title: Open vocabulary keyword spotting through transfer learning from speech synthesis
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2404.06714 (replaced) [pdf, other]
Title: Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Comments: 9 pages, 2 figures, 4 tables; accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73]  arXiv:2404.08224 (replaced) [src]
Title: HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
Comments: This paper is a manuscript that is still in the process of revision, including Table 1, Figure 2, problem definition in section III.B and method description proposed in section IV. In addition, the submitter has not been authorized by the first author and other co-authors to post the paper to arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Information Theory (cs.IT); Systems and Control (eess.SY)
[74]  arXiv:2404.09037 (replaced) [pdf, other]
Title: Intention-Aware Control Based on Belief-Space Specifications and Stochastic Expansion
Subjects: Systems and Control (eess.SY)
[75]  arXiv:2404.09683 (replaced) [pdf, other]
Title: Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[76]  arXiv:2404.10520 (replaced) [pdf, ps, other]
Title: A Game-Theoretic Approach for PMU Deployment Against False Data Injection Attacks
Subjects: Systems and Control (eess.SY)
[77]  arXiv:2404.11194 (replaced) [pdf, other]
Title: Simultaneous compensation of input delay and state/input quantization for linear systems via switched predictor feedback
Comments: 12 pages, 7 figures, submitted to Systems & Control Letters
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Analysis of PDEs (math.AP)
[78]  arXiv:2404.11278 (replaced) [pdf, other]
Title: Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging
Subjects: Instrumentation and Detectors (physics.ins-det); Image and Video Processing (eess.IV)
[79]  arXiv:2404.11525 (replaced) [pdf, other]
Title: JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[ total of 79 entries: 1-79 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2404, contact, help  (Access key information)