~ similar to 2603.18103v1· 17 results
Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more
The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…
This paper provides a unified taxonomy and controlled empirical evaluation of jailbreak attacks and defenses for Large Audio Language Models (LALMs), demonstrating that safety evaluation must consider…
The paper proposes a two-stage robust aggregation framework to detect and mitigate stealthy backdoor attacks in Over-the-air Federated Learning (OTA-FL) systems, effectively maintaining main-task accu…
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more
The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.
CLIP-Inspector (CI) is a novel model-level backdoor detection method that reconstructs potential triggers using out-of-distribution (OOD) images to verify the security of prompt-tuned CLIP models.
Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more
This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…
Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more
The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…
The paper introduces Critical-CoT, a novel two-stage fine-tuning defense framework that equips LLMs with critical thinking abilities to detect and reject malicious reasoning steps introduced by advanc…
The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…
Yinbo Yu, Jing Fang, Xuewen Zhang, Chunwei Tian +3 more
The paper proposes DFBScanner, a lightweight static parameter inspection framework that detects backdoor attacks by analyzing anomalous parameter updates in the final classification layer, achieving f…
The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…
This paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing that the resulting backdoor generalizes at the token feature level, and proposes robust behavioral and weight-l…
Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang +1 more
The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models c…
The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.
Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang +1 more
This paper reinterprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a weak backdoor mechanism, proposing backdoor-inspired strategies to mitigate this generalization failure.
Yanyun Wang, Yu Huang, Zi Liang, Xixin Wu +1 more
The paper introduces Acoustic Interference Attack (AIA), a novel jailbreak method that bypasses Large Audio Language Model (LALM) safety alignments by manipulating the underlying acoustic latent seman…