~ similar to 2605.08278v2· 20 results
Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more
The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…
The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…
This paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing that the resulting backdoor generalizes at the token feature level, and proposes robust behavioral and weight-l…
This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…
The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
Zeyao Liu, Zhendong Zhao, Xiaojun Chen, Xin Zhao +2 more
The paper introduces VIPER, a novel backdoor attack framework that exploits the functional fusion of malicious and benign logic within dynamic prompt architectures, demonstrating a new, high-risk thre…
Yinbo Yu, Jing Fang, Xuewen Zhang, Chunwei Tian +3 more
The paper proposes DFBScanner, a lightweight static parameter inspection framework that detects backdoor attacks by analyzing anomalous parameter updates in the final classification layer, achieving f…
The paper introduces 'adversarial restlessness,' an activation-level signature in LLM residual streams, to detect multi-turn prompt injection attacks with high accuracy.
The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…
Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng +5 more
SkillTrojan introduces a novel backdoor attack targeting the composition of reusable skills in agent systems, demonstrating high attack success rates with minimal impact on normal system functionality…
This paper proposes SABLE, a method for generating semantically meaningful and in-distribution backdoor triggers for federated learning, demonstrating that such attacks remain a potent and practical t…
Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more
The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…
The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.
The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
The paper evaluates graph-context LLM defenders against multi-round, adaptive fraud attacks, finding that while graph context improves early safety, it significantly increases benign over-refusal due…
Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang +1 more
This paper reinterprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a weak backdoor mechanism, proposing backdoor-inspired strategies to mitigate this generalization failure.
Xiangtao Meng, Wenyu Chen, Chuanchao Zang, Xinyu Gao +4 more
This paper systematically measures and explains how sequential model defenses can conflict, finding that 38.9% of ordered defense sequences cause measurable risk exacerbation due to anti-aligned param…
The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…