~ similar to 2605.30366v1· 15 results
Ke Liu, Jiwei Wei, Wenyu Zhang, Shuchang Zhou +4 more
The paper introduces a new dataset (SHDF) and a framework (T-AVFD) to robustly detect audio-visual deepfakes, specifically addressing the challenge posed by singing vocalizations.
Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more
The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…
Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang +1 more
The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models c…
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
This paper provides a unified taxonomy and controlled empirical evaluation of jailbreak attacks and defenses for Large Audio Language Models (LALMs), demonstrating that safety evaluation must consider…
Vojtěch Staněk, Martin Perešíni, Lukáš Sekanina, Anton Firc +1 more
The paper proposes an evolutionary multi-objective score fusion framework that efficiently combines multiple deepfake speech detectors to achieve state-of-the-art accuracy while significantly reducing…
Yihui Wang, Yonghui Yang, Jilong Liu, Fengbin Zhu +2 more
The paper proposes the Shortcut Subspace Suppression (S^3) framework to improve deepfake detection generalization by explicitly identifying and suppressing method-specific shortcuts in learned feature…
The paper introduces DECKER, a domain-invariant framework that significantly improves cross-keyboard keystroke inference by normalizing device variations and leveraging linguistic context, demonstrati…
Leitao Yuan, Qinghua Mao, Daizong Liu, Kun Wang +4 more
The paper proposes FRA-Attack, a frequency-domain regularization method, to significantly improve the transferability of adversarial attacks against closed-source Multimodal Large Language Models (MLL…
This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…
This paper proposes using random sampling of prediction precision during inference to significantly enhance the adversarial robustness of Automatic Speech Recognition (ASR) systems.
The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…
Kun Wang, Meng Chen, Junhao Wang, Yuli Wu +5 more
STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hid…
The study demonstrates that robust, domain-invariant representations of synthetic deception can be rapidly entrenched in LLMs using modest fine-tuning, detectable by linear probes even in early layers…
Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang +1 more
The paper introduces Token-Aware Gradient Optimization (TAGO), demonstrating that sparse optimization focusing only on high-gradient audio tokens is sufficient for effective jailbreaking of audio lang…