Papers similar to 2605.18908v1

~ similar to 2605.18908v1· 20 results

cs.CRcs.AIRecentMay 17, 2026

Lightweight and Fast Backdoor Model Detection

Yinbo Yu, Jing Fang, Xuewen Zhang, Chunwei Tian +3 more

The paper proposes DFBScanner, a lightweight static parameter inspection framework that detects backdoor attacks by analyzing anomalous parameter updates in the final classification layer, achieving f…

View →

cs.CRcs.CVRecentApr 14, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more

The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…

View →

cs.CRcs.AIcs.CVRecentApr 10, 2026

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

CLIP-Inspector (CI) is a novel model-level backdoor detection method that reconstructs potential triggers using out-of-distribution (OOD) images to verify the security of prompt-tuned CLIP models.

View →

cs.LGcs.CRRecentMay 27, 2026

Density-aware Sample-specific Attack

Qiyuan Wang, Yao Li, Raymond K. W. Wong

This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

This paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing that the resulting backdoor generalizes at the token feature level, and proposes robust behavioral and weight-l…

View →

cs.CRcs.AIcs.LGRecentMay 8, 2026

Seed Hijacking of LLM Sampling and Quantum Random Number Defense

Ziyang You, Xiaoke Yang, Zhanling Fan, Feng Guo +2 more

The paper introduces SeedHijack, a backdoor attack that manipulates the pseudorandom number generation process in LLMs to force specific token selections, and proposes a hardware quantum random number…

View →

cs.CRcs.CVRecentMay 2, 2026

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

Yi Yang, Jinyang Huang, Binbin Liu, Feng-Qi Cui +4 more

The paper introduces Checkerboard, a novel, learning-free clean-label backdoor attack that efficiently poisons training data to compromise model integrity with minimal poisoning budget.

View →

cs.CRcs.LGRecentMay 13, 2026

DiffusionHijack: Supply-Chain PRNG Backdoor Attack on Diffusion Models and Quantum Random Number Defense

Ziyang You, Liling Zheng, Xiaoke Yang, Xuxing Lu

The paper introduces DiffusionHijack, a supply-chain backdoor attack that compromises the PRNG used by diffusion models to deterministically control generated images, which is successfully mitigated b…

View →

cs.CRcs.AIRecentMar 29, 2026

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

He Yang, Dongyi Lv, Song Ma, Wei Xi +1 more

Sneakdoor introduces a novel backdoor attack method that enhances stealthiness in dataset condensation by using a generative module to create input-aware triggers, achieving high attack efficacy while…

View →

cs.LGcs.CRRecentApr 21, 2026

Mechanistic Anomaly Detection via Functional Attribution

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

The paper proposes reframing mechanistic anomaly detection (MAD) as a functional attribution problem, using influence functions to measure how much a model's output depends on specific input samples,…

View →

cs.CRcs.AIRecentJun 3, 2026

From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability

Alexandre Cristovão Maiorano

The paper introduces a deterministic method to automatically synthesize initial SIEM detection rules (Sigma rules) from attack simulation findings, ensuring full traceability back to the specific orig…

View →

cs.CRcs.LGRecentMay 28, 2026

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann

The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.

View →

cs.CLcs.AIcs.CRRecentMay 8, 2026

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Sachin Kumar

The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.CRcs.CLRecentMay 14, 2026

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma +1 more

The paper introduces MetaBackdoor, a novel class of LLM backdoor attacks that exploits positional encoding (length-based triggers) rather than requiring modifications to the textual content.

View →

cs.CRRecentMay 10, 2026

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more

The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…

View →

cs.CRcs.LGRecentMay 20, 2026

HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection

Danyu Sun, Jinghuai Zhang, Yuan Tian, Zhou Li

The paper introduces HIDBench, a new benchmark for evaluating LLMs' ability to perform host-based intrusion detection using complex, noisy system logs, finding that model performance degrades signific…

View →

cs.CRRecentApr 15, 2026

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

Jagadeesh Rachapudi, Ritali Vatsi, Pranav Singh, Praful Hambarde +1 more

BackFlush introduces a novel, knowledge-free framework that detects and eliminates unknown backdoor attacks in LLMs while simultaneously preserving existing watermarks, achieving high detection rates…

View →

cs.CRcs.AIcs.LGRecentMay 5, 2026

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda +2 more

The paper introduces Sparse Backdoor, a novel supply-chain attack that embeds a provably undetectable backdoor into pre-trained image classifiers by injecting structured sparse perturbations.

View →