Papers similar to 2605.10987v1

~ similar to 2605.10987v1· 20 results

cs.CRcs.AIcs.CLRecentApr 16, 2026

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Haochun Tang, Yuliang Yan, Jiahua Lu, Huaxiao Liu +1 more

The paper introduces R$^2$A, an adversarial attack that uses suffix optimization to mislead black-box LLM routers into consistently selecting expensive, high-capability models.

View →

cs.CRcs.LGcs.SERecentJun 3, 2026

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

Bin Duan, Zeyu Bai, Guowei Yang

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.LGRecentApr 2, 2026

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.

View →

cs.CRcs.AIRecentApr 16, 2026

SecureRouter: Encrypted Routing for Efficient Secure Inference

Yukuan Zhang, Mengxin Zheng, Qian Lou

SecureRouter is an encrypted routing and inference framework that accelerates secure transformer inference by adaptively selecting the optimal model size based on the encrypted input, achieving a 1.95…

View →

cs.CRcs.MARecentJun 4, 2026

ZERO-APT: A Closed-Loop Adversarial Framework for LLM-Driven Automated Penetration Testing under Intelligent Defense

Anlan Zheng, Tiantian Zhu

ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.CRcs.DBRecentApr 27, 2026

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

Allen Jue

The paper systematically evaluates static and dynamic adversarial attacks on the ALEX learned index, finding that while static poisoning has minimal impact, dynamic attacks can cause significant slowd…

View →

cs.CRcs.AIcs.MARecentApr 18, 2026

enclawed: A Configurable, Sector-Neutral Hardening Framework for Single-User AI Assistant Gateways

Alfredo Metere

enclawed is a configurable, hard-fork hardening framework for AI assistant gateways that enforces strict security controls, verifiable trust, and auditable connectivity for regulated environments.

View →

cs.LGcs.AIcs.CRRecentApr 27, 2026

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang +1 more

This paper reinterprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a weak backdoor mechanism, proposing backdoor-inspired strategies to mitigate this generalization failure.

View →

cs.CRRecentMar 20, 2026

Constraint Migration: A Formal Theory of Throughput in AI Cybersecurity Pipelines

Surasak Phetmanee

The paper develops a formal theory to analyze how throughput changes in AI-enhanced cybersecurity pipelines when stage capacities are perturbed by multipliers.

View →

cs.CRcs.CVRecentMay 19, 2026

Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

Zeyao Liu, Zhendong Zhao, Xiaojun Chen, Xin Zhao +2 more

The paper introduces VIPER, a novel backdoor attack framework that exploits the functional fusion of malicious and benign logic within dynamic prompt architectures, demonstrating a new, high-risk thre…

View →

cs.CRRecentApr 15, 2026

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Firas Ben Hmida, Philemon Hailemariam, Kashif Ali Khan, Birhanu Eshete

NeuroTrace introduces a novel framework using Inference Provenance Graphs (IPGs) to analyze the information flow during deep neural network inference, demonstrating that this provenance provides a rob…

View →

cs.CRcs.AIcs.LGRecentMay 29, 2026

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

Mohammadreza Rashidi

This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…

View →

cs.CRcs.AIcs.LGRecentMay 29, 2026

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

Mohammadreza Rashidi

The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…

View →

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.CRcs.AIRecentMay 8, 2026

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more

WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.

View →

cs.CRRecentJun 3, 2026

DIST-FL: Enhancing Security for TEE-based Aggregation in Federated Learning

Guanlong Wu, Ju Yang, Zhen Huang, Jianyu Niu +3 more

The paper proposes DIST-FL, a distributed system using multiple TEEs and an append-only ledger to enhance the security and robustness of federated learning aggregation against server-side adversaries.

View →

cs.CRcs.CLcs.IRRecentMay 27, 2026

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

The paper introduces 'Routing Hijacking,' a severe attack where malicious clients forge semantic profiles in Federated RAG systems to misroute target queries, and proposes a trust-aware post-routing f…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more

The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…

View →