Papers similar to 2605.02715v1

~ similar to 2605.02715v1· 11 results

cs.SDcs.AIcs.CRRecentJun 4, 2026

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more

The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…

View →

cs.LGcs.CReess.ASRecentMar 23, 2026

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

Matías Pizarro, Raghavan Narasimhan, Asja Fischer

This paper proposes using random sampling of prediction precision during inference to significantly enhance the adversarial robustness of Automatic Speech Recognition (ASR) systems.

View →

eess.AScs.CLcs.SDRecentMay 30, 2026

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection

Xinwei Cao, Mengxuan Lu, Torbjørn Svendsen, Giampiero Salvi

The paper proposes a Lagrangian sub-flow (LSF) framework and geometric diagnostic signals to improve out-of-distribution detection using Continuous Normalizing Flows, overcoming the likelihood paradox…

View →

cs.CRcs.SDeess.ASRecentMay 18, 2026

Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection

Yifan Liao, Yule Liu, Zhen Sun, Zongmin Zhang +4 more

The paper introduces MARS, a novel meta-adversarial framework that significantly improves black-box adversarial attacks against state-of-the-art Singing Voice Deepfake Detection (SVDD) systems by esca…

View →

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →

eess.AScs.AIcs.HCRecentMay 27, 2026

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

Lelia Erscoi, Tomi Kinnunen

This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…

View →

cs.LGcs.AIcs.CRRecentApr 18, 2026

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

Bo Wang, Jia Ni, Mengnan Zhao, Zhan Qin +1 more

This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Sem…

View →

cs.CRcs.SDRecentApr 17, 2026

Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs

Jaechul Roh, Amir Houmansadr

This paper demonstrates that benign fine-tuning significantly degrades safety in Audio LLMs, showing that the vulnerability is distinct from text and vision modalities and is highly dependent on the m…

View →

cs.CRcs.LGcs.SDRecentMar 18, 2026

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang, Yuli Wu +5 more

STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hid…

View →

cs.CRcs.AIRecentApr 30, 2026

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Prashant Kulkarni

The paper introduces 'adversarial restlessness,' an activation-level signature in LLM residual streams, to detect multi-turn prompt injection attacks with high accuracy.

View →

cs.AIRecentMay 28, 2026

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif +1 more

The paper introduces VisAnomReasoner, a parameter-efficient Vision-Language Model (VLM), trained on a new benchmark (VisAnomBench) to accurately and interpretably detect anomalies in time-series data.

View →