The paper proposes DFBScanner, a lightweight static parameter inspection framework that detects backdoor attacks by analyzing anomalous parameter updates in the final classification layer, achieving fast and generalizable detection.
Deep neural networks (DNN), despite their remarkable performance, are highly vulnerable to backdoor attacks. Existing defenses mainly rely on activation anomaly analysis or trigger reverse engineering and often require clean samples or prior knowledge of trigger patterns, resulting in limited efficacy, practicability, and generalizability. More critically, while advanced attacks can implement backdoor implantation in milliseconds, current detection approaches typically demand minutes or even hours. To this end, we propose DFBScanner, a lightweight static parameter inspection framework for fast backdoor scanning. DFBScanner leverages our key observation that backdoor-induced feature perturbations can lead to distinctive and anomalous parameter updates in the final classification layer. Hence, we shift our detection focus from recognizing diverse and attack-specific trigger patterns targeted by prior work, to identifying the unified backdoor manifestation within the final layer, thereby enabling efficient and attack-agnostic detection. Specifically, by constructing and strategically combining multiple anomaly indicators of the final-layer parameters into a Trojan clue, DFBScanner detects backdoors through maximum anomaly scoring. DFBScanner is evaluated on a large-scale backdoor benchmark, including over 5,000 backdoor models trained on 4 datasets, 12 network architectures, 20 types of backdoor triggers, 2 attack strategies (all-to-one and -all), and 3 backdoor injection methods (data poisoning, training pipeline manipulation, and bit-flips). Numerical results show that DFBScanner achieves a 97.17% true-positive rate, 0.95% false-positive rate, and an average detection time of only 1 ms per model, significantly outperforming prior methods.
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cr…
The paper introduces SET, a robust input-level backdoor detection framework that…
STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
STEP introduces a novel, black-box, retraining-free detector that profiles audio…
Backdoor Attacks on Decentralised Post-Training
This paper introduces the first backdoor attack specifically targeting pipeline…
FL-PBM: Pre-Training Backdoor Mitigation for Federated Learning
The paper proposes FL-PBM, a novel pre-training defense mechanism for federated…
Physical Backdoor Attack Against Deep Learning-Based Modulation Classification
This paper proposes a physical backdoor attack against deep learning modulation…
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
CLIP-Inspector (CI) is a novel model-level backdoor detection method that recons…
Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language…
The paper introduces Critical-CoT, a novel two-stage fine-tuning defense framewo…
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
This paper proposes SABLE, a method for generating semantically meaningful and i…