Papers similar to 2605.05928v1

~ similar to 2605.05928v1· 20 results

cs.LGcs.AIcs.CRRecentApr 27, 2026

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang +1 more

This paper reinterprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a weak backdoor mechanism, proposing backdoor-inspired strategies to mitigate this generalization failure.

View →

cs.CRRecentMar 17, 2026

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more

This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…

View →

cs.LGcs.CRRecentMay 27, 2026

Density-aware Sample-specific Attack

Qiyuan Wang, Yao Li, Raymond K. W. Wong

This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…

View →

cs.CRRecentApr 27, 2026

DETOUR: A Practical Backdoor Attack against Object Detection

Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang +1 more

DETOUR proposes a practical backdoor attack against object detection models by using semantic triggers that are robust to variations in size, location, and field of view (FoV), overcoming limitations…

View →

cs.CLcs.AIcs.CRRecentMay 8, 2026

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Sachin Kumar

The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.

View →

cs.CRcs.AIRecentMay 17, 2026

Lightweight and Fast Backdoor Model Detection

Yinbo Yu, Jing Fang, Xuewen Zhang, Chunwei Tian +3 more

The paper proposes DFBScanner, a lightweight static parameter inspection framework that detects backdoor attacks by analyzing anomalous parameter updates in the final classification layer, achieving f…

View →

cs.CRRecentJun 1, 2026

On Improving Robustness of Deepfake Image Detectors

Abu Taib Mohammed Shahjahan, Mohammad Mannan, Abdessamad Ben Hamza, Amr Youssef

The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.CVRecentMay 29, 2026

BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning

Ziqing Yang, Rui Wen, Xinlei He, Yun Shen +2 more

The paper introduces BadBone, a stealthy and adaptive backdoor attack that compromises a backbone model specifically to target downstream tasks utilizing prompt learning, demonstrating high attack suc…

View →

cs.CRcs.CVRecentApr 14, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more

The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…

View →

cs.CRcs.LGRecentMay 19, 2026

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

This paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing that the resulting backdoor generalizes at the token feature level, and proposes robust behavioral and weight-l…

View →

cs.CRRecentApr 8, 2026

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

Adrian Shuai Li, Md Ajwad Akil, Elisa Bertino

The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…

View →

cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →

cs.CRcs.AIcs.LGRecentMay 5, 2026

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda +2 more

The paper introduces Sparse Backdoor, a novel supply-chain attack that embeds a provably undetectable backdoor into pre-trained image classifiers by injecting structured sparse perturbations.

View →

cs.CRcs.AIcs.CVRecentApr 10, 2026

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

CLIP-Inspector (CI) is a novel model-level backdoor detection method that reconstructs potential triggers using out-of-distribution (OOD) images to verify the security of prompt-tuned CLIP models.

View →

cs.CVcs.CRRecentApr 21, 2026

PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang +1 more

PASTA proposes a novel, twofold stealthy backdoor attack that enables high-success-rate backdoor activation across arbitrary patches in Vision Transformers by leveraging the Trigger Radiating Effect (…

View →

cs.CRcs.CVRecentMay 19, 2026

Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

Zeyao Liu, Zhendong Zhao, Xiaojun Chen, Xin Zhao +2 more

The paper introduces VIPER, a novel backdoor attack framework that exploits the functional fusion of malicious and benign logic within dynamic prompt architectures, demonstrating a new, high-risk thre…

View →

cs.CRcs.LGRecentApr 7, 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

View →