Papers similar to 2605.27689v1

~ similar to 2605.27689v1· 20 results

cs.LGcs.CRcs.CVRecentMay 22, 2026

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

The paper introduces a sample-wise targeted adversarial attack that successfully misclassifies only specific, triggered inputs during test-time adaptation while maintaining the overall label distribut…

View →

cs.CRcs.AIRecentMay 8, 2026

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more

The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.

View →

cs.LGcs.AIRecentMay 30, 2026

COPF: An Online Framework for Deployment-Stable Counterfactual Fairness in Evolving Graphs

Sheng'en Li, Dongmian Zou

The paper introduces COPF, an online framework that ensures deployment-stable counterfactual fairness in link recommendation systems operating on evolving graphs by monitoring and controlling group di…

View →

cs.CRcs.AIRecentMay 18, 2026

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

Rohith Uppala

The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…

View →

cs.MAcs.AIcs.CRRecentApr 24, 2026

Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems

Jie Wu, Ming Gong

The paper introduces Distributed Sentinel, a zero-trust architecture that prevents Context-Fragmented Violations (CFVs) in multi-agent systems by propagating security state across departmental boundar…

View →

cs.DBcs.AIcs.CRRecentMay 22, 2026

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Joydeep Chandra

CHRONOS is a novel three-layer architecture designed to address coupled failures in temporal data marketplaces by integrating temporal decay, changepoint-aware pricing, and differential privacy for ro…

View →

cs.CRcs.AIRecentMay 12, 2026

IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection

Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai +1 more

The paper introduces IPI-proxy, an open-source intercepting proxy toolkit designed to red-team web-browsing AI agents by injecting adversarial payloads into live HTTP responses from whitelisted domain…

View →

cs.CRcs.AIcs.CLRecentJun 3, 2026

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Nicholas Saban

The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…

View →

cs.CRcs.AIcs.SERecentMay 5, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg, Oren Gal

The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.AIcs.LGRecentMay 26, 2026

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Md Hafizur Rahman, Zafaryab Haider, Tanzim Mahfuz, Prabuddha Chakraborty

The paper introduces HARP, a new methodology to measure how localized harm (like compromising one agent) can be amplified into significant, system-wide harm within complex multi-agent LLM workflows.

View →

cs.CRcs.LGRecentApr 22, 2026

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo, Antonino Nocera +1 more

The paper proposes a certifiably robust malware detection framework using randomized smoothing and feature ablation to guarantee detection accuracy against metamorphic evasion attacks.

View →

cs.CRcs.CLcs.IRRecentMay 27, 2026

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

The paper introduces 'Routing Hijacking,' a severe attack where malicious clients forge semantic profiles in Federated RAG systems to misroute target queries, and proposes a trust-aware post-routing f…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more

The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…

View →

cs.CRcs.AIRecentMay 26, 2026

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more

ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…

View →

cs.CRcs.LGRecentApr 14, 2026

Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge

Gustavo de Carvalho Bertoli

This paper empirically evaluates the effectiveness of Differential Privacy (DP) against Membership Inference Attacks (MIAs) in Federated Learning, demonstrating that a stacking attack strategy can det…

View →

cs.CRcs.AIcs.PLRecentMay 1, 2026

Certified Purity for Cognitive Workflow Executors: From Static Analysis to Cryptographic Attestation

Alan L. McCann

The paper introduces a certified purity architecture that strengthens governance in cognitive workflow systems by replacing insufficient runtime checks with cryptographically attested structural guara…

View →

cs.CRstat.APRecentMay 8, 2026

Combating Organized Platform Abuse: Amplifying Weak Risk Signals with Structural Information

Meng He, Jia Long Loh

The paper proposes a novel structural invariant approach, derived from the economic constraints of fraud, that amplifies weak, low-precision signals into highly accurate fraud detections without requi…

View →

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.CRcs.AIcs.LGRecentMay 11, 2026

Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing

Yunze Zhao, Yibo Zhao, Yuchen Zhang, Zaoxing Liu +1 more

The paper introduces GRIEF, a greybox fuzzer that discovers critical, concurrency-related vulnerabilities in LLM serving systems by treating timed multi-request traces as inputs, finding issues like c…

View →