Electrical Engineering and Systems Science

Authors and titles for recent submissions, skipping first 319

[ total of 455 entries: 1-25 | ... | 245-269 | 270-294 | 295-319 | 320-344 | 345-369 | 370-394 | 395-419 | ... | 445-455 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 6 Jun 2024 (continued, showing 25 of 83 entries)

[320] arXiv:2406.02652 [pdf, other]: Title: RepCNN: Micro-sized, Mighty Models for Wakeword Detection

Authors: Arnav Kundu, Prateeth Nayak, Hywel Richards, Priyanka Padmanabhan, Devang Naik

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[321] arXiv:2406.02649 [pdf, other]: Title: Keyword-Guided Adaptation of Automatic Speech Recognition

Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

Comments: Accepted to InterSpeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[322] arXiv:2406.02640 [pdf, other]: Title: Ghost imaging-based Non-contact Heart Rate Detection

Authors: Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

Comments: 4 pages, 6 figures

Subjects: Image and Video Processing (eess.IV); Medical Physics (physics.med-ph); Optics (physics.optics)
[323] arXiv:2406.02626 [pdf, ps, other]: Title: A Brief Overview of Optimization-Based Algorithms for MRI Reconstruction Using Deep Learning

Authors: Wanyu Bian

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
[324] arXiv:2406.02608 [pdf, other]: Title: PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis

Authors: Varun Reddy

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[325] arXiv:2406.02572 [pdf, other]: Title: Selfsupervised learning for pathological speech detection

Authors: Shakeel Ahmad Sheikh

Comments: in Intersection of Book Chapter in Machine Leanring and Computational Social Sciences CRC (in progress) 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[326] arXiv:2406.02569 [pdf, other]: Title: Cluster-to-Predict Affect Contours from Speech

Authors: Gökhan Kuşçu, Engin Erzin

Comments: 8 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC)
[327] arXiv:2406.02566 [pdf, other]: Title: Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition

Authors: Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[328] arXiv:2406.02563 [pdf, other]: Title: A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system

Authors: Sunil Kumar Kopparapu, Ashish Panda

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[329] arXiv:2406.02562 [pdf, other]: Title: Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

Comments: Table 2 is revised

Journal-ref: ICASSP 2024 Workshop(HSCMA 2024) paper

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[330] arXiv:2406.02561 [pdf, ps, other]: Title: Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm

Authors: Abdulhady Abas Abdullah, Hadi Veisi, Tarik Rashid

Comments:

Subjects: Audio and Speech Processing (eess.AS)
[331] arXiv:2406.02560 [pdf, other]: Title: Less Peaky and More Accurate CTC Forced Alignment by Label Priors

Authors: Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur

Comments: Accepted by ICASSP 2024. Github repo: this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[332] arXiv:2406.02557 [pdf, other]: Title: EVAN: Evolutional Video Streaming Adaptation via Neural Representation

Authors: Mufan Liu, Le Yang, Yiling Xu, Ye-kui Wang, Jenq-Neng Hwang

Comments: accepted by ICME (conference)

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[333] arXiv:2406.02555 [pdf, ps, other]: Title: PhoWhisper: Automatic Speech Recognition for Vietnamese

Authors: Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

Comments: Accepted to ICLR 2024 Tiny Papers Track

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[334] arXiv:2406.02554 [pdf, other]: Title: Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[335] arXiv:2406.03472 (cross-list from cs.LG) [pdf, other]: Title: Solving Differential Equations using Physics-Informed Deep Equilibrium Models

Authors: Bruno Machado Pacheco, Eduardo Camponogara

Comments: Accepted at CASE 2024

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[336] arXiv:2406.03461 (cross-list from cs.CV) [pdf, other]: Title: Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts

Authors: Dominik Scheuble, Chenyang Lei, Seung-Hwan Baek, Mario Bijelic, Felix Heide

Comments: Accepted at CVPR 2024; Project Website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[337] arXiv:2406.03438 (cross-list from cs.IT) [pdf, other]: Title: CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[338] arXiv:2406.03407 (cross-list from cs.LG) [pdf, other]: Title: Physics and geometry informed neural operator network with application to acoustic scattering

Authors: Siddharth Nair, Timothy F. Walsh, Greg Pickrell, Fabio Semperlotti

Comments: 20 pages of main text, 9 figures

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Computational Physics (physics.comp-ph)
[339] arXiv:2406.03405 (cross-list from cs.LG) [pdf, ps, other]: Title: Amalgam: A Framework for Obfuscated Neural Network Training on the Cloud

Authors: Sifat Ut Taki, Spyridon Mastorakis

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Systems and Control (eess.SY)
[340] arXiv:2406.03344 (cross-list from cs.SD) [pdf, other]: Title: Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

Comments: Code is available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[341] arXiv:2406.03270 (cross-list from math.OC) [pdf, other]: Title: A Successive Gap Constraint Linearization Method for Optimal Control Problems with Equilibrium Constraints

Authors: Kangyu Lin, Toshiyuki Ohtsuka

Comments: Forthcoming, (Accepted to the 2024 IFAC Conference on Nonlinear Model Predictive Control (NMPC))

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[342] arXiv:2406.03251 (cross-list from cs.SD) [pdf, other]: Title: ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

Authors: Theo Mariotte, Anthony Larcher, Silvio Montresor, Jean-Hugh Thomas

Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[343] arXiv:2406.03247 (cross-list from cs.SD) [pdf, other]: Title: Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

Comments: Accepted by INTERSPEECH 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[344] arXiv:2406.03240 (cross-list from cs.SD) [pdf, other]: Title: Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

Comments: Accepted by INTERSPEECH 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

[ total of 455 entries: 1-25 | ... | 245-269 | 270-294 | 295-319 | 320-344 | 345-369 | 370-394 | 395-419 | ... | 445-455 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2406, contact, help (Access key information)

> eess

Electrical Engineering and Systems Science

Authors and titles for recent submissions, skipping first 319

Thu, 6 Jun 2024 (continued, showing 25 of 83 entries)