Papers similar to 2605.30366v1

~ similar to 2605.30366v1· 15 results

cs.AIcs.MMcs.SDRecentMay 27, 2026

From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection

Ke Liu, Jiwei Wei, Wenyu Zhang, Shuchang Zhou +4 more

The paper introduces a new dataset (SHDF) and a framework (T-AVFD) to robustly detect audio-visual deepfakes, specifically addressing the challenge posed by singing vocalizations.

View →

cs.SDcs.AIcs.CRRecentJun 4, 2026

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more

The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…

View →

cs.CRcs.AIcs.SDRecentApr 16, 2026

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang +1 more

The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models c…

View →

cs.CRRecentJun 1, 2026

On Improving Robustness of Deepfake Image Detectors

Abu Taib Mohammed Shahjahan, Mohammad Mannan, Abdessamad Ben Hamza, Amr Youssef

The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…

View →

cs.SDcs.AIcs.CLRecentMay 28, 2026

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Bo-Han Feng, Yu-Hsuan Li Liang, Chien-Feng Liu, You-Hsuan Chang +1 more

This paper provides a unified taxonomy and controlled empirical evaluation of jailbreak attacks and defenses for Large Audio Language Models (LALMs), demonstrating that safety evaluation must consider…

View →

cs.SDcs.AIcs.CRRecentApr 1, 2026

Evolutionary Multi-Objective Fusion of Deepfake Speech Detectors

Vojtěch Staněk, Martin Perešíni, Lukáš Sekanina, Anton Firc +1 more

The paper proposes an evolutionary multi-objective score fusion framework that efficiently combines multiple deepfake speech detectors to achieve state-of-the-art accuracy while significantly reducing…

View →

cs.CVcs.AIRecentJun 1, 2026

Suppressing Forgery-Specific Shortcuts for Generalizable Deepfake Detection

Yihui Wang, Yonghui Yang, Jilong Liu, Fengbin Zhu +2 more

The paper proposes the Shortcut Subspace Suppression (S^3) framework to improve deepfake detection generalization by explicitly identifying and suppressing method-specific shortcuts in learned feature…

View →

cs.CRcs.SDRecentMay 5, 2026

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal, Arun Balaji Buduru

The paper introduces DECKER, a domain-invariant framework that significantly improves cross-keyboard keystroke inference by normalizing device variations and leveraging linguistic context, demonstrati…

View →

cs.CRcs.AIcs.LGRecentMay 20, 2026

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

Leitao Yuan, Qinghua Mao, Daizong Liu, Kun Wang +4 more

The paper proposes FRA-Attack, a frequency-domain regularization method, to significantly improve the transferability of adversarial attacks against closed-source Multimodal Large Language Models (MLL…

View →

eess.AScs.AIcs.HCRecentMay 27, 2026

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

Lelia Erscoi, Tomi Kinnunen

This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…

View →

cs.LGcs.CReess.ASRecentMar 23, 2026

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

Matías Pizarro, Raghavan Narasimhan, Asja Fischer

This paper proposes using random sampling of prediction precision during inference to significantly enhance the adversarial robustness of Automatic Speech Recognition (ASR) systems.

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…

View →

cs.CRcs.LGcs.SDRecentMar 18, 2026

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang, Yuli Wu +5 more

STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hid…

View →

cs.LGcs.AIRecentMay 28, 2026

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

Vahideh Zolfaghari

The study demonstrates that robust, domain-invariant representations of synthetic deception can be rapidly entrenched in LLMs using modest fine-tuning, detectable by linear probes even in early layers…

View →

cs.CRcs.AIcs.CLRecentMay 6, 2026

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang +1 more

The paper introduces Token-Aware Gradient Optimization (TAGO), demonstrating that sparse optimization focusing only on high-gradient audio tokens is sufficient for effective jailbreaking of audio lang…

View →