~ similar to 2604.01330v1· 11 results
Yifan Liao, Yule Liu, Zhen Sun, Zongmin Zhang +4 more
The paper introduces MARS, a novel meta-adversarial framework that significantly improves black-box adversarial attacks against state-of-the-art Singing Voice Deepfake Detection (SVDD) systems by esca…
This study investigates how humans detect synthetic speech in real-world contexts, finding that while overt detection failed for fully synthetic speech, participants still implicitly discriminated utt…
Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more
The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
The paper proposes evaluating certified training methods by comparing their Pareto fronts across the natural-certified accuracy trade-off, revealing superior performance and previously unappreciated c…
Kun Wang, Meng Chen, Junhao Wang, Yuli Wu +5 more
STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hid…
Ke Liu, Jiwei Wei, Wenyu Zhang, Shuchang Zhou +4 more
The paper introduces a new dataset (SHDF) and a framework (T-AVFD) to robustly detect audio-visual deepfakes, specifically addressing the challenge posed by singing vocalizations.
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
The paper introduces GRIDS, a framework using Local Intrinsic Dimensionality (LID) to detect anomalies in self-supervised speech model representations, showing that LID elevation correlates with ASR d…
Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang +1 more
The paper introduces Token-Aware Gradient Optimization (TAGO), demonstrating that sparse optimization focusing only on high-gradient audio tokens is sufficient for effective jailbreaking of audio lang…
This paper proposes a 3D CNN detector that leverages temporal artifacts to accurately identify high-quality deepfake videos, demonstrating robust detection even after social media re-encoding.