We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 71 entries: 1-71 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 8 Dec 22

[1]  arXiv:2212.03311 [pdf, ps, other]
Title: Co-optimizing Behind-The-Meter Resources under Net Metering
Comments: 8 pages, 4 figures, 4 tables
Subjects: Systems and Control (eess.SY)

We consider the problem of co-optimizing behind-the-meter (BTM) storage and flexible demands with BTM stochastic renewable generation. Under a generalized net energy metering (NEM) policy-NEM X, we show that the optimal co-optimization policy schedules the flexible demands based on a load priority list that defers less prioritized loads to times when the BTM generation is abundant. This gives rise to the notion of a net-zero zone, which we quantify under different distributed energy resources (DER) compositions. We highlight the special case of inflexible demands that results in a storage policy that minimizes the imports and exports from and to the grid. Comparative statics are provided on the optimal co-optimization policy. Simulations using real residential data show the surplus gains of various customers under different DER compositions.

[2]  arXiv:2212.03372 [pdf, other]
Title: Real-time rapid leakage estimation for deep space habitats using exponentially-weighted adaptively-refined search
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

The recent accelerated growth in space-related research and development activities makes the near-term need for long-term extraterrestrial habitats evident. Such habitats must operate under continuous disruptive conditions arising from extreme environments like meteoroid impacts, extreme temperature fluctuations, galactic cosmic rays, destructive dust, and seismic events. Loss of air or atmospheric leakage from a habitat poses safety challenges that demand proper attention. Such leakage may arise from micro-meteoroid impacts, crack growth, bolt/rivet loosening, and seal deterioration. In this paper, leakage estimation in deep space habitats is posed as an inverse problem. A forward pressure-based dynamical model is formulated for atmospheric leakage. Experiments are performed on a small-scaled pressure chamber where different leakage scenarios are emulated and corresponding pressure values are measured. An exponentially-weighted adaptively-refined search (EWARS) algorithm is developed and validated for the inverse problem of real-time leakage estimation. It is demonstrated that the proposed methodology can achieve real-time estimation and tracking of constant and variable leaks with accuracy.

[3]  arXiv:2212.03377 [pdf, other]
Title: GaitVibe+: Enhancing Structural Vibration-based Footstep Localization Using Temporary Cameras for In-home Gait Analysis
Comments: 7 pages, 7 figures
Subjects: Signal Processing (eess.SP); Computational Engineering, Finance, and Science (cs.CE)

In-home gait analysis is important for providing early diagnosis and adaptive treatments for individuals with gait disorders. Existing systems include wearables and pressure mats, but they have limited scalability. Recent studies have developed vision-based systems to enable scalable, accurate in-home gait analysis, but it faces privacy concerns due to the exposure of people's appearances. Our prior work developed footstep-induced structural vibration sensing for gait monitoring, which is device-free, wide-ranged, and perceived as more privacy-friendly. Although it has succeeded in temporal gait event extraction, it shows limited performance for spatial gait parameter estimation due to imprecise footstep localization. In particular, the localization error mainly comes from the estimation error of the wave arrival time at the vibration sensors and its error propagation to wave velocity estimations. Therefore, we present GaitVibe+, a vibration-based footstep localization method fused with temporarily installed cameras for in-home gait analysis. Our method has two stages: fusion and operating. In the fusion stage, both cameras and vibration sensors are installed to record only a few trials of the subject's footstep data, through which we characterize the uncertainty in wave arrival time and model the wave velocity profiles for the given structure. In the operating stage, we remove the camera to preserve privacy at home. The footstep localization is conducted by estimating the time difference of arrival (TDoA) over multiple vibration sensors, whose accuracy is improved through the reduced uncertainty and velocity modeling during the fusion stage. We evaluate GaitVibe+ through a real-world experiment with 50 walking trials. With only 3 trials of multi-modal fusion, our approach has an average localization error of 0.22 meters, which reduces the spatial gait parameter error from 111% to 27%.

[4]  arXiv:2212.03378 [pdf, other]
Title: PigV$^2$: Monitoring Pig Vital Signs through Ground Vibrations Induced by Heartbeat and Respiration
Comments: 7 pages, 9 figures
Subjects: Signal Processing (eess.SP); Applied Physics (physics.app-ph)

Pig vital sign monitoring (e.g., estimating the heart rate (HR) and respiratory rate (RR)) is essential to understand the stress level of the sow and detect the onset of parturition. It helps to maximize peri-natal survival and improve animal well-being in swine production. The existing approach mainly relies on manual measurement, which is labor-intensive and only provides a few points of information. Other sensing modalities such as wearables and cameras are developed to enable more continuous measurement, but are still limited due to animal discomfort, data transfer, and storage challenges. In this paper, we introduce PigV$^2$, the first system to monitor pig heart rate and respiratory rate through ground vibrations. Our approach leverages the insight that both heartbeat and respiration generate ground vibrations when the sow is lying on the floor. We infer vital information by sensing and analyzing these vibrations. The main challenge in developing PigV$^2$ is the overlap of vital- and non-vital-related information in the vibration signals, including pig movements, pig postures, pig-to-sensor distances, and so on. To address this issue, we first characterize their effects, extract their current status, and then reduce their impact by adaptively interpolating vital rates over multiple sensors. PigV$^2$ is evaluated through a real-world deployment with 30 pigs. It has 3.4% and 8.3% average errors in monitoring the HR and RR of the sows, respectively.

[5]  arXiv:2212.03388 [pdf]
Title: Determination of Optimal Size and Number of Movable Energy Resources for Distribution System Resilience Enhancement
Comments: 11 pages, 6 figures, presented in 2022 CIGRE Grid of the Future Symposium
Subjects: Systems and Control (eess.SY)

This paper proposes an approach based on graph theory and combinatorial enumeration for sizing of movable energy resources (MERs) to improve the resilience of the electric power supply. The proposed approach determines the size and number of MERs to be deployed in a distribution system to ensure the quickest possible recovery of the distribution system following an extreme event. The proposed approach starts by generating multiple line outage scenarios based on fragility curves of distribution lines. The generated scenarios are reduced using the k-means method. The distribution network is modeled as a graph where distribution network reconfiguration is performed for each reduced line outage scenario. The combinatorial enumeration technique is used to compute all combinations of total MER by size and number. The expected load curtailment (ELC) corresponding to each locational combination of MERs is determined. The minimum ELCs of all combinations of total MER are used to construct a minimum ELC matrix, which is later utilized to determine optimal size and number of MERs. The proposed approach is validated through a case study performed on a 33-node distribution test system.

[6]  arXiv:2212.03391 [pdf, other]
Title: Robo-Chargers: Optimal Operation and Planning of a Robotic Charging System to Alleviate Overstay
Subjects: Systems and Control (eess.SY)

Charging infrastructure availability is a major concern for plug-in electric vehicle users. Nowadays, the limited public chargers are commonly occupied by vehicles which have already been fully charged, hindering others' accessibility - a phenomenon that we refer to as \emph{overstay}. In this paper, we analyze a charging facility innovation to tackle the challenge of overstay, leveraging the idea of Robo-chargers - automated chargers that can rotate in a charging station and proactively plug or unplug plug-in electric vehicles. We formalize an operation model for stations incorporating Fixed-chargers and Robo-chargers. Optimal scheduling can be solved with the recognition of the combinatorial nature of vehicle-charger assignments, charging dynamics, and customer waiting behaviors. Then, with operation model nested, we develop a planning model to guide economical investment on both types of chargers so that the total cost of ownership is minimized. In the planning phase, it further considers charging demand variances and service capacity requirements. In this paper, we provide systematic techno-economical methods to evaluate if introducing Robo-chargers is beneficial given a specific application scenario. Comprehensive sensitivity analysis based on real-world data highlight the advantages of Robo-chargers, especially in a scenario where overstay is severe. Validations also suggest the tractability of operation model and robustness of planning results for real-time application under reasonable model mismatches, uncertainties and disturbances.

[7]  arXiv:2212.03398 [pdf, other]
Title: Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in human-to-human dialogues in respect to acoustic feature and then extend the analysis to emotion features. The analysis results show strong evidence of entrainment in terms of both acoustic and emotion features. Based on this findings, we implement two entrainment policies and assess if the integration of entrainment principle into a Text-to-Speech (TTS) system improves the synthesis performance and the user experience. It is found that the integration of the entrainment principle into a TTS system brings performance improvement when considering acoustic features, while no obvious improvement is observed when considering emotion features.

[8]  arXiv:2212.03401 [pdf, other]
Title: MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source. The precisely estimated directional embedding provides quite effective spatial discrimination guidance for the neural beamformer to offset the effect of phase wrapping, thus allowing more accurate reconstruction of two sources' speech signals. Experiments show that our proposed MIMO-DBnet not only achieves a comprehensive decent improvement compared to baseline systems, but also maintain the performance on high frequency bands when phase wrapping occurs.

[9]  arXiv:2212.03408 [pdf, other]
Title: Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement
Comments: Accepted by AAAI 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning (RL) with a developed difficulty-regulated reward that is related to network performance, model complexity, and "the difficulty of the SE task". The results show that our method achieves comparable or superior performance to existing approaches. In particular, Selector-Enhancer is potentially effective for real-world denoising, where the number and types of noise are varies on a single noisy mixture.

[10]  arXiv:2212.03470 [pdf, other]
Title: Improving trajectory localization accuracy via direction-of-arrival estimation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Sound source localization is crucial in acoustic sensing and monitoring-related applications. In this paper, we do a comprehensive analysis of improvement in sound source localization by combining the direction of arrivals (DOAs) with their derivatives which quantify the changes in the positions of sources over time. This study uses the SALSA-Lite feature with a convolutional recurrent neural network (CRNN) model for predicting DOAs and their first-order derivatives. An update rule is introduced to combine the predicted DOAs with the estimated derivatives to obtain the final DOAs. The experimental validation is done using TAU-NIGENS Spatial Sound Events (TNSSE) 2021 dataset. We compare the performance of the networks predicting DOAs with derivative vs. the one predicting only the DOAs at low SNR levels. The results show that combining the derivatives with the DOAs improves the localization accuracy of moving sources.

[11]  arXiv:2212.03476 [pdf, ps, other]
Title: Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Comments: Subimitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.

[12]  arXiv:2212.03480 [pdf, other]
Title: Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR). In addition, ASR performance could be further improved if the model is dedicated to audio content information learning theoretically. To this end, we propose a progressive multi-scale self-supervised learning (PMS-SSL) method, which uses fine-grained target sets to compute SSL loss at top layer while uses coarse-grained target sets at intermediate layers. Furthermore, PMS-SSL introduces multi-scale structure into multi-head self-attention for better speech representation, which restricts the attention area into a large scope at higher layers while restricts the attention area into a small scope at lower layers. Experiments on Librispeech dataset indicate the effectiveness of our proposed method. Compared with HuBERT, PMS-SSL achieves 13.7% / 12.7% relative WER reduction on test other evaluation subsets respectively when fine-tuned on 10hours / 100hours subsets.

[13]  arXiv:2212.03482 [pdf, other]
Title: Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Comments: Submitted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data. Based on a well-designed self-supervised learning pattern, pre-trained models can be used to serve lots of downstream speech tasks such as automatic speech recognition. In order to take full advantage of the labed data in low resource task, we present an improved pre-training method by introducing a supervision-enhanced acoustic unit (SEAU) pattern to intensify the expression of comtext information and ruduce the training cost. Encoder representations extracted from the SEAU pattern are used to generate more representative target units for HuBERT pre-training process. The proposed method, named SeHuBERT, achieves a relative word error rate reductions of 10.5% and 4.9% comared with the standard HuBERT on Turkmen speech recognition task with 500 hours and 100 hours fine-tuning data respectively. Extended to more languages and more data, SeHuBERT can aslo achieve a relative word error rate reductions of approximately 10% at half of the training cost compared with HuBERT.

[14]  arXiv:2212.03505 [pdf]
Title: A Four-stage Heuristic Algorithm for Solving On-demand Meal Delivery Routing Problem
Subjects: Systems and Control (eess.SY)

Meal delivery services provided by platforms with integrated delivery systems are becoming increasingly popular. This paper adopts a rolling horizon approach to solve the meal delivery routing problem (MDRP). To improve delivery efficiency in scenarios with high delivery demand, multiple orders are allowed to be combined into one bundle with orders from different restaurants. Following this strategy, an optimization-based four-stage heuristic algorithm is developed to generate an optimal routing plan at each decision point. The algorithm first generates bundles according to orders' spatial and temporal distribution. Secondly, we find feasible bundle pairs. Then, routes for delivering any single bundle or multiple bundles are optimized, respectively. Finally, the routes are assigned to available couriers. In computational experiments using instances from open datasets, the system's performance is evaluated in respect of average click-to-door time and ready-to-pickup time. We demonstrate that this algorithm can effectively process real-time information and assign optimal routes to the couriers. By comparing the proposed method with existing the-state-of-the-art algorithms, the results indicate that our method can generate solutions with higher service quality and shorter distance.

[15]  arXiv:2212.03511 [pdf, other]
Title: Multi-Objective Model-Predictive Control for Dielectric Elastomer Wave Harvesters
Subjects: Systems and Control (eess.SY)

This contribution deals with multi-objective model-predictive control (MPC) of a wave energy converter (WEC) device concept, which can harvest energy from sea waves using a dielectric elastomer generator (DEG) power take-off system. We aim to maximise the extracted energy through control while minimising the accumulated damage to the DEG. With reference to system operation in stochastic waves, we first generate ground truth solutions by solving an optimal control problem, and we analyse the MPC performance to determine a prediction horizon that trades off accuracy and efficiency for computation. Fixed weights in the MPC scheme can produce unpredictable costs for variable sea condition, meaning the average rate of cost accumulation can vary vastly. To steer this cost growth, we propose a heuristic to adapt the algorithm by changing the weighting of the cost functions using for fulfilling the long-time goal of accumulating a small enough damage in a fixed time. A simulated case-study is presented in order to evaluate the performance of the proposed MPC framework and the weight-adaptation algorithm. The proposed heuristic proves to be able to limit the amount of accumulated damage while remaining close to (or even improving) the energy yield obtained with a comparable fixed-weight MPC.

[16]  arXiv:2212.03515 [pdf, other]
Title: FPGA Implementation of Multi-Layer Machine Learning Equalizer with On-Chip Training
Comments: To be presented at the 2023 Optical Fiber Communication Conference (OFC)
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We design and implement an adaptive machine learning equalizer that alternates multiple linear and nonlinear computational layers on an FPGA. On-chip training via gradient backpropagation is shown to allow for real-time adaptation to time-varying channel impairments.

[17]  arXiv:2212.03525 [pdf, other]
Title: Superimposed Pilot-based Channel Estimation for RIS-Assisted IoT Systems Using Lightweight Networks
Comments: 11 pages, 7 figures,
Subjects: Signal Processing (eess.SP)

Conventional channel estimation (CE) for Internet of Things (IoT) systems encounters challenges such as low spectral efficiency, high energy consumption, and blocked propagation paths. Although superimposed pilot-based CE schemes and the reconfigurable intelligent surface (RIS) could partially tackle these challenges, limited researches have been done for a systematic solution. In this paper, a superimposed pilot-based CE with the reconfigurable intelligent surface (RIS)-assisted mode is proposed and further enhanced the performance by networks. Specifically, at the user equipment (UE), the pilot for CE is superimposed on the uplink user data to improve the spectral efficiency and energy consumption for IoT systems, and two lightweight networks at the base station (BS) alleviate the computational complexity and processing delay for the CE and symbol detection (SD). These dedicated networks are developed in a cooperation manner. That is, the conventional methods are employed to perform initial feature extraction, and the developed neural networks (NNs) are oriented to learn along with the extracted features. With the assistance of the extracted initial feature, the number of training data for network training is reduced. Simulation results show that, the computational complexity and processing delay are decreased without sacrificing the accuracy of CE and SD, and the normalized mean square error (NMSE) and bit error rate (BER) performance at the BS are improved against the parameter variance.

[18]  arXiv:2212.03540 [pdf, other]
Title: Policy Transfer via Enhanced Action Space
Comments: 14 pages
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Though transfer learning is promising to increase the learning efficiency, the existing methods are still subject to the challenges from long-horizon tasks, especially when expert policies are sub-optimal and partially useful. Hence, a novel algorithm named EASpace (Enhanced Action Space) is proposed in this paper to transfer the knowledge of multiple sub-optimal expert policies. EASpace formulates each expert policy into multiple macro actions with different execution time period, then integrates all macro actions into the primitive action space directly. Through this formulation, the proposed EASpace could learn when to execute which expert policy and how long it lasts. An intra-macro-action learning rule is proposed by adjusting the temporal difference target of macro actions to improve the data efficiency and alleviate the non-stationarity issue in multi-agent settings. Furthermore, an additional reward proportional to the execution time of macro actions is introduced to encourage the environment exploration via macro actions, which is significant to learn a long-horizon task. Theoretical analysis is presented to show the convergence of the proposed algorithm. The efficiency of the proposed algorithm is illustrated by a grid-based game and a multi-agent pursuit problem. The proposed algorithm is also implemented to real physical systems to justify its effectiveness.

[19]  arXiv:2212.03549 [pdf, ps, other]
Title: An Analytical Framework for Downlink LEO Satellite Communications based on Cox Point Processes
Comments: Submitted to IEEE Journal
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

This work develops an analytical framework for downlink low earth orbit (LEO) satellite communications, leveraging tools from stochastic geometry. We propose a tractable approach to the analysis of such satellite communication systems accounting for the fact that satellites are located on circular orbits. We accurately characterize this geometric property of such LEO satellite constellations by developing a Cox point process model that jointly produces orbits and satellites on these orbits. Our work differs from existing studies that have assumed satellites' locations as completely random binomial point processes. For this Cox model, we derive the outage probability of the proposed network and the distribution of the signal-to-interference-plus-noise ratio (SINR) of an arbitrarily located user in the network. By determining various network performance metrics as functions of key network parameters, this work allows one to assess the statistical properties of downlink LEO satellite communications and thus can be used as a system-level design tool.

[20]  arXiv:2212.03564 [pdf, other]
Title: Optimizing a Digital Twin for Fault Diagnosis in Grid Connected Inverters -- A Bayesian Approach
Journal-ref: 2022 IEEE Energy Conversion Congress and Exposition (ECCE), 2022
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In this paper, a hyperparameter tuning based Bayesian optimization of digital twins is carried out to diagnose various faults in grid connected inverters. As fault detection and diagnosis require very high precision, we channelize our efforts towards an online optimization of the digital twins, which, in turn, allows a flexible implementation with limited amount of data. As a result, the proposed framework not only becomes a practical solution for model versioning and deployment of digital twins design with limited data, but also allows integration of deep learning tools to improve the hyperparameter tuning capabilities. For classification performance assessment, we consider different fault cases in virtual synchronous generator (VSG) controlled grid-forming converters and demonstrate the efficacy of our approach. Our research outcomes reveal the increased accuracy and fidelity levels achieved by our digital twin design, overcoming the shortcomings of traditional hyperparameter tuning methods.

[21]  arXiv:2212.03604 [pdf, ps, other]
Title: Online Feedback Optimization of Compressor Stations with Model Adaptation using Gaussian Process Regression
Subjects: Systems and Control (eess.SY)

Online Feedback Optimization is a method used to steer the operation of a process plant to its optimal operating point without explicitly solving a nonlinear constrained optimization problem. This is achieved by leveraging a linear plant model and feedback from measurements. However the presence of plant-model mismatch leads to suboptimal results when using this approach. Learning the plant-model mismatch enables Online Feedback Optimization to overcome this shortcoming. In this work we present a novel application of Online Feedback Optimization with online model adaptation using Gaussian Process regression. We demonstrate our approach with a realistic load sharing problem in a compressor station with parametric and structural plant-model mismatch. We assume imperfect knowledge of the compressor maps and design an Online Feedback Optimization controller that minimizes the compressor station power consumption. In the evaluated scenario, imperfect knowledge of the plant leads to a 5\% increase in power consumption compared to the case with perfect knowledge. We demonstrate that Online Feedback Optimization with model adaptation reduces this increase to only 0.8%, closely approximating the case of perfect knowledge of the plant, regardless of the type of mismatch.

[22]  arXiv:2212.03616 [pdf, other]
Title: Image Compression With Learned Lifting-Based DWT and Learned Tree-Based Entropy Models
Comments: 11 pages, 17 figures
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)

This paper explores learned image compression based on traditional and learned discrete wavelet transform (DWT) architectures and learned entropy models for coding DWT subband coefficients. A learned DWT is obtained through the lifting scheme with learned nonlinear predict and update filters. Several learned entropy models are proposed to exploit inter and intra-DWT subband coefficient dependencies, akin to traditional EZW, SPIHT, or EBCOT algorithms. Experimental results show that when the proposed learned entropy models are combined with traditional wavelet filters, such as the CDF 9/7 filters, compression performance that far exceeds that of JPEG2000 can be achieved. When the learned entropy models are combined with the learned DWT, compression performance increases further. The computations in the learned DWT and all entropy models, except one, can be simply parallelized, and the systems provide practical encoding and decoding times on GPUs.

[23]  arXiv:2212.03630 [pdf]
Title: One Sample Diffusion Model in Projection Domain for Low-Dose CT Imaging
Comments: 11 pages, 11 figures. arXiv admin note: text overlap with arXiv:2211.13926
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. To extract sufficient prior information from single sample, the Hankel matrix formulation is employed. Besides, the penalized weighted least-squares and total variation are introduced to achieve superior image quality. Specifically, we first train a score-based generative model on one sinogram by extracting a great number of tensors from the structural-Hankel matrix as the network input to capture prior distribution. Then, at the inference stage, the stochastic differential equation solver and data consistency step are performed iteratively to obtain the sinogram data. Finally, the final image is obtained through the filtered back-projection algorithm. The reconstructed results are approaching to the normal-dose counterparts. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality.

[24]  arXiv:2212.03724 [pdf, other]
Title: Hierarchical Finite State Machines for Efficient Optimal Planning in Large-scale Systems
Comments: Submitted to ECC 2023
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In this paper, we consider a planning problem for a hierarchical finite state machine (HFSM) and develop an algorithm for efficiently computing optimal plans between any two states. The algorithm consists of an offline and an online step. In the offline step, one computes exit costs for each machine in the HFSM. It needs to be done only once for a given HFSM, and it is shown to have time complexity scaling linearly with the number of machines in the HFSM. In the online step, one computes an optimal plan from an initial state to a goal state, by first reducing the HFSM (using the exit costs), computing an optimal trajectory for the reduced HFSM, and then expand this trajectory to an optimal plan for the original HFSM. The time complexity is near-linearly with the depth of the HFSM. It is argued that HFSMs arise naturally for large-scale control systems, exemplified by an application where a robot moves between houses to complete tasks. We compare our algorithm with Dijkstra's algorithm on HFSMs consisting of up to 2 million states, where our algorithm outperforms the latter, being several orders of magnitude faster.

[25]  arXiv:2212.03729 [pdf, other]
Title: Enabling Resilient and Real-Time Network Operations in Space: A Novel Multi-Layer Satellite Networking Scheme
Authors: Peng Hu
Comments: Published in the Proceedings of the 2022 IEEE Latin-American Conference on Communications (LATINCOM), 30 November - 2 December 2022, Rio de Janeiro, Brazil
Subjects: Systems and Control (eess.SY); Networking and Internet Architecture (cs.NI)

Recently advanced low-Earth-orbit (LEO) satellite networks represented by large constellations and advanced payloads provide great promises for enabling high-quality Internet connectivity to any place on Earth. However, the traditional access-based approach to satellite operations cannot meet the pressing requirements of real-time, reliable, and resilient operations for LEO satellites. A new scheme is proposed based on multi-layer satellite networking considering the advanced Ka-band and optical communications payloads on a satellite platform. The proposed scheme can enable efficient and resilient message transmissions for critical telecommand and telemetry missions through different layers of satellite networks, which consist of LEO, medium-Earth-orbit (MEO), and geostationary (GEO) satellites. The proposed scheme is evaluated in a 24-hr satellite mission and shows superior performance improvements compared to the traditional operations approach.

[26]  arXiv:2212.03769 [pdf]
Title: INTERPRETER -- Tool for non-technical losses detection
Comments: 7 pages, 6 figures, 2 tables, conference paper
Journal-ref: 2022 IEEE Green Energy and Smart Systems Conference (IGESSC), 2022, pp. 1-6
Subjects: Systems and Control (eess.SY)

This article presents a tool for the detection of non-technical losses, which is being developed within the European INTERPRETER project. The tool employs a hybrid method based on feature detection from smart meter data and grid model analysis. This paper focuses on the grid model analysis, where voltage deviations between the grid model (digital twin) and real-world measurements at a low-voltage pilot site have been evaluated. Energy measurements from smart meters represent hourly mean power, while voltage measurements are instantaneous with uneven time intervals. Thus, measurements are not synchronous, which poses a major challenge for grid analysis. The proposed method focuses on daily mean, minimum, and maximum voltage and results show that deviations in daily minimum voltage are the most useful ones. A heatmap is developed, which helps the DSO expert to have a quick overview of all deviations of all meters in a certain time interval (1-day time step). A total of 6 locations have been identified where field inspections will be done.

[27]  arXiv:2212.03803 [pdf, other]
Title: Optimal Control Design for Operating a Hybrid PV Plant with Robust Power Reserves for Fast Frequency Regulation Services
Comments: Submitted to IEEE Transactions on Sustainable Energy
Subjects: Systems and Control (eess.SY)

This paper presents an optimal control strategy for operating a solar hybrid system consisting of solar photovoltaic (PV) and a high-power, low-storage battery energy storage system (BESS). A state-space model of the hybrid PV plant is first derived, based on which an adaptive model predictive controller is designed. The controller's objective is to control the PV and BESS to follow power setpoints sent to the the hybrid system while maintaining desired power reserves and meeting system operational constraints. Furthermore, an extended Kalman filter (EKF) is implemented for estimating the battery SOC, and an error sensitivity is executed to assess its limitations. To validate the proposed strategy, detailed EMT models of the hybrid system are developed so that losses and control limits can be quantified accurately. Day-long simulations are performed in an OPAL-RT real-time simulator using second-by-second actual PV farm data as inputs. Results verify that the proposed method can follow power setpoints while maintaining power reserves in days of high irradiance intermittency even with a small BESS storage.

[28]  arXiv:2212.03804 [pdf, other]
Title: A Frequency-Structure Approach for Link Stream Analysis
Comments: 16 pages, 9 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

A link stream is a set of triplets $(t, u, v)$ indicating that $u$ and $v$ interacted at time $t$. Link streams model numerous datasets and their proper study is crucial in many applications. In practice, raw link streams are often aggregated or transformed into time series or graphs where decisions are made. Yet, it remains unclear how the dynamical and structural information of a raw link stream carries into the transformed object. This work shows that it is possible to shed light into this question by studying link streams via algebraically linear graph and signal operators, for which we introduce a novel linear matrix framework for the analysis of link streams. We show that, due to their linearity, most methods in signal processing can be easily adopted by our framework to analyze the time/frequency information of link streams. However, the availability of linear graph methods to analyze relational/structural information is limited. We address this limitation by developing (i) a new basis for graphs that allow us to decompose them into structures at different resolution levels; and (ii) filters for graphs that allow us to change their structural information in a controlled manner. By plugging-in these developments and their time-domain counterpart into our framework, we are able to (i) obtain a new basis for link streams that allow us to represent them in a frequency-structure domain; and (ii) show that many interesting transformations to link streams, like the aggregation of interactions or their embedding into a euclidean space, can be seen as simple filters in our frequency-structure domain.

[29]  arXiv:2212.03824 [pdf, other]
Title: Adaptive Bayesian Beamforming for Imaging by Marginalizing the Speed of Sound
Subjects: Signal Processing (eess.SP)

Imaging methods based on array signal processing often require a fixed propagation speed of the medium, or speed of sound (SoS) for methods based on acoustic signals. The resolution of the images formed using these methods is strongly affected by the assumed SoS, which, due to multipath, nonlinear propagation, and non-uniform mediums, is challenging at best to select. In this letter, we propose a Bayesian approach to marginalize the influence of the SoS on beamformers for imaging. We adapt Bayesian direction-of-arrival estimation to an imaging setting and integrate a popular minimum variance beamformer over the posterior of the SoS. To solve the Bayesian integral efficiently, we use numerical Gauss quadrature. We apply our beamforming approach to shallow water sonar imaging where multipath and nonlinear propagation is abundant. We compare against the minimum variance distortionless response (MVDR) beamformer and demonstrate that its Bayesian counterpart achieves improved range and azimuthal resolution while effectively suppressing multipath artifacts.

[30]  arXiv:2212.03839 [pdf, other]
Title: End-to-end Optimization of Constellation Shaping for Wiener Phase Noise Channels with a Differentiable Blind Phase Search
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

As the demand for higher data throughput in coherent optical communication systems increases, we need to find ways to increase capacity in existing and future optical communication links. To address the demand for higher spectral efficiencies, we apply end-to-end optimization for joint geometric and probabilistic constellation shaping in the presence of Wiener phase noise and carrier phase estimation. Our approach follows state-of-the-art bitwise auto-encoders, which require a differentiable implementation of all operations between transmitter and receiver, including the DSP algorithms. In this work, we show how to modify the ubiquitous blind phase search (BPS) algorithm, a popular carrier phase estimation algorithm, to make it differentiable and include it in the end-to-end constellation shaping. By leveraging joint geometric and probabilistic constellation shaping, we are able to obtain a robust and pilot-free modulation scheme improving the performance of 64-ary communication systems by at least 0.1bit/symbol compared to square QAM constellations with neural demappers and by 0.05 bit/symbol compared to previously presented approaches applying only geometric constellation shaping.

[31]  arXiv:2212.03854 [pdf, other]
Title: BiPMAP: A Toolbox for Predictions of Perceived Motion Artifacts on Modern Displays
Comments: 11 pages, 9 figures
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Quantitative Methods (q-bio.QM)

Presenting dynamic scenes without incurring motion artifacts visible to observers requires sustained effort from the display industry. A tool that predicts motion artifacts and simulates artifact elimination through optimizing the display configuration is highly desired to guide the design and manufacture of modern displays. Despite the popular demands, there is no such tool available in the market. In this study, we deliver an interactive toolkit, Binocular Perceived Motion Artifact Predictor (BiPMAP), as an executable file with GPU acceleration. BiPMAP accounts for an extensive collection of user-defined parameters and directly visualizes a variety of motion artifacts by presenting the perceived continuous and sampled moving stimuli side-by-side. For accurate artifact predictions, BiPMAP utilizes a novel model of the human contrast sensitivity function to effectively imitate the frequency modulation of the human visual system. In addition, BiPMAP is capable of deriving various in-plane motion artifacts for 2D displays and depth distortion in 3D stereoscopic displays.

Cross-lists for Thu, 8 Dec 22

[32]  arXiv:2212.03288 (cross-list from cs.NI) [pdf]
Title: Estimation Large- Scale Fading Channels for Transmit Orthogonal Pilot Reuse Sequences in Massive MIMO System
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Massive multiple-input multiple-output (MIMO) is a critical technology for future fifth-generation (5G) systems. Reduced pilot contamination (PC) enhanced system performance, and reduced inter-cell interference and improved channel estimation. However, because the pilot sequence transmitted by users in a single cell to neighboring cells is not orthogonal, massive MIMO systems are still constrained. We propose channel evaluation using orthogonal pilot reuse sequences (PRS) and zero forced (ZF) pre-coding techniques to eliminate channel quality in end users with poor channel quality based on channel evaluation, large-scale shutdown evaluation, and analysis of maximum transmission efficiency. We derived the lower bounds on the downlink data rate (DR) and signal-to-interference noise ratio (SINR) that can be achieved based on PRS assignment to a group of users where the number of antenna elements mitigated the interference when the number of antennas reaches infinity. The channel coherence interval limitation, the orthogonal PRS cannot be allocated to all UEs in each cell. The short coherence intervals able to reduce the PC and improve the quality of channel. The results of the modelling showed that higher DR can be achieved due to better channel evaluation and lower loss.

[33]  arXiv:2212.03323 (cross-list from cs.RO) [pdf, other]
Title: Receding Horizon Planning with Rule Hierarchies for Autonomous Vehicles
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Autonomous vehicles must often contend with conflicting planning requirements, e.g., safety and comfort could be at odds with each other if avoiding a collision calls for slamming the brakes. To resolve such conflicts, assigning importance ranking to rules (i.e., imposing a rule hierarchy) has been proposed, which, in turn, induces rankings on trajectories based on the importance of the rules they satisfy. On one hand, imposing rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; while on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune. In this paper, we present an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds. We achieve this by formulating rank-preserving reward functions that are monotonic in the rank of the trajectories induced by the rule hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped with a rule hierarchy and its corresponding rank-preserving reward function, we develop a two-stage planner that can efficiently resolve conflicting planning requirements. We demonstrate that our approach can generate motion plans in ~7-10 Hz for various challenging road navigation and intersection negotiation scenarios.

[34]  arXiv:2212.03329 (cross-list from cs.LG) [pdf, other]
Title: Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is often inevitable due to reduced number of electrodes and coverage of scalp regions of a low-density EEG montage. To address this issue, we introduce knowledge distillation (KD), a learning mechanism developed for transferring knowledge/information between neural network models, to enhance the performance of low-density EEG decoding. Our framework includes a newly proposed similarity-keeping (SK) teacher-student KD scheme that encourages a low-density EEG student model to acquire the inter-sample similarity as in a pre-trained teacher model trained on high-density EEG data. The experimental results validate that our SK-KD framework consistently improves motor-imagery EEG decoding accuracy when number of electrodes deceases for the input EEG data. For both common low-density headphone-like and headband-like montages, our method outperforms state-of-the-art KD methods across various EEG decoding model architectures. As the first KD scheme developed for enhancing EEG decoding, we foresee the proposed SK-KD framework to facilitate the practicality of low-density EEG-based BCI in real-world applications.

[35]  arXiv:2212.03357 (cross-list from cs.LG) [pdf, other]
Title: Contactless Oxygen Monitoring with Gated Transformer
Comments: 19 pages, Workshop on Learning from Time Series for Health, NeurIPS 2022
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

With the increasing popularity of telehealth, it becomes critical to ensure that basic physiological signals can be monitored accurately at home, with minimal patient overhead. In this paper, we propose a contactless approach for monitoring patients' blood oxygen at home, simply by analyzing the radio signals in the room, without any wearable devices. We extract the patients' respiration from the radio signals that bounce off their bodies and devise a novel neural network that infers a patient's oxygen estimates from their breathing signal. Our model, called \emph{Gated BERT-UNet}, is designed to adapt to the patient's medical indices (e.g., gender, sleep stages). It has multiple predictive heads and selects the most suitable head via a gate controlled by the person's physiological indices. Extensive empirical results show that our model achieves high accuracy on both medical and radio datasets.

[36]  arXiv:2212.03390 (cross-list from cs.LG) [pdf, ps, other]
Title: A Temporal Graph Neural Network for Cyber Attack Detection and Localization in Smart Grids
Comments: 5 pages, 6 figures, accepted at ISGT conference of 2023
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Systems and Control (eess.SY)

This paper presents a Temporal Graph Neural Network (TGNN) framework for detection and localization of false data injection and ramp attacks on the system state in smart grids. Capturing the topological information of the system through the GNN framework along with the state measurements can improve the performance of the detection mechanism. The problem is formulated as a classification problem through a GNN with message passing mechanism to identify abnormal measurements. The residual block used in the aggregation process of message passing and the gated recurrent unit can lead to improved computational time and performance. The performance of the proposed model has been evaluated through extensive simulations of power system states and attack scenarios showing promising performance. The sensitivity of the model to intensity and location of the attacks and model's detection delay versus detection accuracy have also been evaluated.

[37]  arXiv:2212.03420 (cross-list from cs.RO) [pdf, other]
Title: What Happens When Pneu-Net Soft Robotic Actuators Get Fatigued?
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macroscale damages and eventually failure. This paper reports finite element modeling (FEM) of pneu-nets at high angles, along with repetitive experimentation at high deformation rates, in order to study the effect and behavior of fatigue in soft robotic actuators, which would result in deviation from the ideal behavior. Comparing the FEM model and experimental data, we show that FEM can model the performance of the actuator before fatigue to a bending angle of 167 degrees with ~96% accuracy. We also show that the FEM model performance will drop to 80% due to fatigue after repetitive high-angle bending. The results of this paper objectively highlight the emergence of fatigue over cyclic activation of the system and the resulting deviation from the computational FEM model. Such behavior can be considered in future controllers to adapt the system with time-variable and non-autonomous response dynamics of soft robots.

[38]  arXiv:2212.03435 (cross-list from cs.SD) [pdf, other]
Title: Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Comments: Submitted to ICASSP2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation and intonation of the second language in different contexts without mutual interference. This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker. We introduce phonology embedding to capture the English differences between different phonology. Embedding mask is applied to language embedding for distinguishing information between different languages and to phonology embedding for focusing on English expression. We specially design an embedding strength modulator to capture the dynamic strength of language and phonology. Experiments show that our approach can produce significantly more natural and standard spoken English speech of the monolingual Chinese speaker. From analysis, we find that suitable phonology control contributes to better performance in different scenarios.

[39]  arXiv:2212.03550 (cross-list from cs.RO) [pdf]
Title: Support Vector Machine for Determining Euler Angles in an Inertial Navigation System
Authors: Aleksandr N. Grekov (1) (2), Aleksei A. Kabanov (2), Sergei Yu. Alekseev (1), ((1) Institute of Natural and Technical Systems, (2) Sevastopol State University)
Comments: 7 pages, 5 figures, 5 formulas
Journal-ref: Monitoring systems of environment 4(46),2021: 134-142
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Systems and Control (eess.SY); Instrumentation and Detectors (physics.ins-det)

The paper discusses the improvement of the accuracy of an inertial navigation system created on the basis of MEMS sensors using machine learning (ML) methods. As input data for the classifier, we used infor-mation obtained from a developed laboratory setup with MEMS sensors on a sealed platform with the ability to adjust its tilt angles. To assess the effectiveness of the models, test curves were constructed with different values of the parameters of these models for each core in the case of a linear, polynomial radial basis function. The inverse regularization parameter was used as a parameter. The proposed algorithm based on MO has demonstrated its ability to correctly classify in the presence of noise typical for MEMS sensors, where good classification results were obtained when choosing the optimal values of hyperpa-rameters.

[40]  arXiv:2212.03558 (cross-list from cs.CL) [pdf, other]
Title: Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.

[41]  arXiv:2212.03657 (cross-list from cs.CL) [pdf, other]
Title: M3ST: Mix at Three Levels for Speech Translation
Comments: Submitted to ICASSP 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many tasks by enlarging the dataset. In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. Specifically, we conduct two phases of fine-tuning based on a pre-trained model using external machine translation (MT) data. In the first stage of fine-tuning, we mix the training corpus at three levels, including word level, sentence level and frame level, and fine-tune the entire model with mixed data. At the second stage of fine-tuning, we take both original speech sequences and original text sequences in parallel into the model to fine-tune the network, and use Jensen-Shannon divergence to regularize their outputs. Experiments on MuST-C speech translation benchmark and analysis show that M^3ST outperforms current strong baselines and achieves state-of-the-art results on eight directions with an average BLEU of 29.9.

[42]  arXiv:2212.03658 (cross-list from cs.CV) [pdf, other]
Title: Learning Double-Compression Video Fingerprints Left from Social-Media Platforms
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI); Image and Video Processing (eess.IV)

Social media and messaging apps have become major communication platforms. Multimedia contents promote improved user engagement and have thus become a very important communication tool. However, fake news and manipulated content can easily go viral, so, being able to verify the source of videos and images as well as to distinguish between native and downloaded content becomes essential. Most of the work performed so far on social media provenance has concentrated on images; in this paper, we propose a CNN architecture that analyzes video content to trace videos back to their social network of origin. The experiments demonstrate that stating platform provenance is possible for videos as well as images with very good accuracy.

[43]  arXiv:2212.03741 (cross-list from cs.CV) [pdf, other]
Title: Magic: Multi Art Genre Intelligent Choreography Dataset and Network for 3D Dance Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Achieving multiple genres and long-term choreography sequences from given music is a challenging task, due to the lack of a multi-genre dataset. To tackle this problem,we propose a Multi Art Genre Intelligent Choreography Dataset (MagicDance). The data of MagicDance is captured from professional dancers assisted by motion capture technicians. It has a total of 8 hours 3D motioncapture human dances with paired music, and 16 different dance genres. To the best of our knowledge, MagicDance is the 3D dance dataset with the most genres. In addition, we find that the existing two types of methods (generation-based method and synthesis-based method) can only satisfy one of the diversity and duration, but they can complement to some extent. Based on this observation, we also propose a generation-synthesis choreography network (MagicNet), which cascades a Diffusion-based 3D Diverse Dance fragments Generation Network (3DGNet) and a Genre&Coherent aware Retrieval Module (GCRM). The former can generate various dance fragments from only one music clip. The latter is utilized to select the best dance fragment generated by 3DGNet and switch them into a complete dance according to the genre and coherent matching score. Quantitative and qualitative experiments demonstrate the quality of MagicDance, and the state-of-the-art performance of MagicNet.

[44]  arXiv:2212.03752 (cross-list from cs.CV) [pdf, other]
Title: GLeaD: Improving GANs with A Generator-Leading Task
Comments: Project page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition. Towards a fairer game in GANs, we propose a new paradigm for adversarial training, which makes G assign a task to D as well. Specifically, given an image, we expect D to extract representative features that can be adequately decoded by G to reconstruct the input. That way, instead of learning freely, D is urged to align with the view of G for domain classification. Experimental results on various datasets demonstrate the substantial superiority of our approach over the baselines. For instance, we improve the FID of StyleGAN2 from 4.30 to 2.55 on LSUN Bedroom and from 4.04 to 2.82 on LSUN Church. We believe that the pioneering attempt present in this work could inspire the community with better designed generator-leading tasks for GAN improvement.

[45]  arXiv:2212.03765 (cross-list from cs.LG) [pdf, other]
Title: Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Comments: 11 pages, 3 figures, under review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

Gradient-based first-order convex optimization algorithms find widespread applicability in a variety of domains, including machine learning tasks. Motivated by the recent advances in fixed-time stability theory of continuous-time dynamical systems, we introduce a generalized framework for designing accelerated optimization algorithms with strongest convergence guarantees that further extend to a subclass of non-convex functions. In particular, we introduce the \emph{GenFlow} algorithm and its momentum variant that provably converge to the optimal solution of objective functions satisfying the Polyak-{\L}ojasiewicz (PL) inequality, in a fixed-time. Moreover for functions that admit non-degenerate saddle-points, we show that for the proposed GenFlow algorithm, the time required to evade these saddle-points is bounded uniformly for all initial conditions. Finally, for strongly convex-strongly concave minimax problems whose optimal solution is a saddle point, a similar scheme is shown to arrive at the optimal solution again in a fixed-time. The superior convergence properties of our algorithm are validated experimentally on a variety of benchmark datasets.

[46]  arXiv:2212.03812 (cross-list from cs.CL) [pdf, other]
Title: An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Comments: Accepted for publication in ACM Transactions on Asian and Low-Resource Language Information Processing
Journal-ref: ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 6 November 2022, Article No 128
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.

[47]  arXiv:2212.03814 (cross-list from cs.CV) [pdf, other]
Title: iQuery: Instruments as Queries for Audio-Visual Sound Separation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multi-modal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument: one must finetune the entire visual and audio network for all musical instruments. We re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism. Our approach ensures cross-modal consistency and cross-instrument disentanglement. We utilize "visually named" queries to initiate the learning of audio queries and use cross-modal attention to remove potential sound source interference at the estimated waveforms. To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism. Experimental results on three benchmarks demonstrate that our iQuery improves audio-visual sound source separation performance.

[48]  arXiv:2212.03817 (cross-list from cs.CL) [pdf, other]
Title: A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS
Comments: Accepted by CIKM-22
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.

[49]  arXiv:2212.03826 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Domain adaptation is one of the prominent strategies for handling both domain shift, that is widely encountered in large-scale land use/land cover map calculation, and the scarcity of pixel-level ground truth that is crucial for supervised semantic segmentation. Studies focusing on adversarial domain adaptation via re-styling source domain samples, commonly through generative adversarial networks, have reported varying levels of success, yet they suffer from semantic inconsistencies, visual corruptions, and often require a large number of target domain samples. In this letter, we propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images, that i) leads to semantically consistent and noise-free images, ii) operates with a single target domain sample (i.e. one-shot) and iii) at a fraction of the number of parameters required from state-of-the-art methods. More specifically an image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains, and a perceptual network module and loss function is further introduced to enforce semantic consistency. Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods. Our source code will be available at \url{https://github.com/Sarmadfismael/LRM_I2I}.

[50]  arXiv:2212.03848 (cross-list from cs.CV) [pdf, other]
Title: NeRFEditor: Differentiable Style Decomposition for Full 3D Scene Editing
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Image and Video Processing (eess.IV)

We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360{\deg} as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360{\deg}views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation.

Replacements for Thu, 8 Dec 22

[51]  arXiv:2008.04938 (replaced) [pdf, ps, other]
Title: Content-based Music Similarity with Triplet Networks
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52]  arXiv:2106.07736 (replaced) [pdf, ps, other]
Title: Unique sparse decomposition of low rank matrices
Comments: Accepted by 2021 Neurips, in IEEE Transactions on Information Theory, 2022
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA)
[53]  arXiv:2107.11664 (replaced) [pdf, other]
Title: Utilizing the Structure of the Curvelet Transform with Compressed Sensing
Subjects: Image and Video Processing (eess.IV)
[54]  arXiv:2111.13154 (replaced) [pdf, other]
Title: Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Deep Ensembles
Journal-ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 195, January 2023, Pages 269-286
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[55]  arXiv:2202.13870 (replaced) [pdf, other]
Title: Simulating Network Paths with Recurrent Buffering Units
Comments: Accepted in AAAI 2023, 19 pages, 14 figures
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)
[56]  arXiv:2203.06250 (replaced) [pdf, other]
Title: Combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task
Comments: 24 pages, 15 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[57]  arXiv:2204.10070 (replaced) [pdf, other]
Title: Multi-UAV trajectory planning for 3D visual inspection of complex structures
Comments: Revised abstract, references and captions, 17 pages
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
[58]  arXiv:2205.12448 (replaced) [pdf, ps, other]
Title: Transportation-Inequalities, Lyapunov Stability and Sampling for Dynamical Systems on Continuous State Space
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Systems and Control (eess.SY)
[59]  arXiv:2207.10805 (replaced) [pdf, ps, other]
Title: PowerFDNet: Deep Learning-Based Stealthy False Data Injection Attack Detection for AC-model Transmission Systems
Journal-ref: IEEE Open Journal of the Computer Society, 2022
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
[60]  arXiv:2208.10011 (replaced) [pdf, ps, other]
Title: A Two-phase On-line Joint Scheduling for Welfare Maximization of Charging Station
Subjects: Systems and Control (eess.SY)
[61]  arXiv:2208.14017 (replaced) [pdf, ps, other]
Title: Gridless 3D Recovery of Image Sources from Room Impulse Responses
Authors: Tom Sprunck (IRMA, TONUS), Yannick Privat (IRMA, TONUS), Cédric Foy (UMRAE), Antoine Deleforge (MULTISPEECH)
Comments: IEEE Signal Processing Letters, 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Classical Physics (physics.class-ph)
[62]  arXiv:2209.03700 (replaced) [pdf, ps, other]
Title: Ziv-Zakai Bound for DOAs Estimation
Subjects: Signal Processing (eess.SP)
[63]  arXiv:2209.10046 (replaced) [pdf, other]
Title: Contractivity of the Method of Successive Approximations for Optimal Control
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[64]  arXiv:2209.11866 (replaced) [pdf, other]
Title: ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
Comments: Audio samples: this https URL; Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[65]  arXiv:2209.13413 (replaced) [pdf, other]
Title: Reinforcement Learning with Non-Exponential Discounting
Comments: 22 pages, 3 figures, published at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
[66]  arXiv:2210.05989 (replaced) [pdf, other]
Title: Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty
Comments: Accepted at AAAI 2023
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[67]  arXiv:2210.09659 (replaced) [pdf, ps, other]
Title: Large-Scale Bandwidth and Power Optimization for Multi-Modal Edge Intelligence Autonomous Driving
Comments: Submitted to IEEE
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[68]  arXiv:2211.17221 (replaced) [pdf, other]
Title: Interval Valued Fuzzy Modeling and Indirect Adaptive Control of Quadrotor
Comments: 25 pages
Subjects: Systems and Control (eess.SY)
[69]  arXiv:2212.00743 (replaced) [pdf, other]
Title: Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[70]  arXiv:2212.02186 (replaced) [pdf, ps, other]
Title: Three High-rate Beamforming Methods for Active IRS-aided Wireless Network
Subjects: Signal Processing (eess.SP)
[71]  arXiv:2212.02610 (replaced) [pdf, other]
Title: Audio Latent Space Cartography
Comments: Late Breaking / Demo, ISMIR 2022 (this https URL)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 71 entries: 1-71 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2212, contact, help  (Access key information)