~ similar to 2605.02715v1· 11 results
Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more
The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…
This paper proposes using random sampling of prediction precision during inference to significantly enhance the adversarial robustness of Automatic Speech Recognition (ASR) systems.
The paper proposes a Lagrangian sub-flow (LSF) framework and geometric diagnostic signals to improve out-of-distribution detection using Continuous Normalizing Flows, overcoming the likelihood paradox…
Yifan Liao, Yule Liu, Zhen Sun, Zongmin Zhang +4 more
The paper introduces MARS, a novel meta-adversarial framework that significantly improves black-box adversarial attacks against state-of-the-art Singing Voice Deepfake Detection (SVDD) systems by esca…
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…
Bo Wang, Jia Ni, Mengnan Zhao, Zhan Qin +1 more
This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Sem…
This paper demonstrates that benign fine-tuning significantly degrades safety in Audio LLMs, showing that the vulnerability is distinct from text and vision modalities and is highly dependent on the m…
Kun Wang, Meng Chen, Junhao Wang, Yuli Wu +5 more
STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hid…
The paper introduces 'adversarial restlessness,' an activation-level signature in LLM residual streams, to detect multi-turn prompt injection attacks with high accuracy.
Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif +1 more
The paper introduces VisAnomReasoner, a parameter-efficient Vision-Language Model (VLM), trained on a new benchmark (VisAnomBench) to accurately and interpretably detect anomalies in time-series data.