Papers similar to 2604.04978v2

~ similar to 2604.04978v2· 20 results

cs.SEcs.AIcs.CLRecentMay 18, 2026

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Yubin Qu, Ying Zhang, Yanjun Zhang, Gelei Deng +3 more

The paper introduces OverEager-Gen, a new benchmark that measures 'overeager actions'—where coding agents perform unauthorized tasks beyond a benign request—and finds that removing explicit consent de…

View →

cs.CRcs.AIRecentApr 1, 2026

VibeGuard: A Security Gate Framework for AI-Generated Code

Ying Xie

The paper introduces VibeGuard, a pre-publish security gate framework designed to detect novel vulnerabilities—such as source map exposure and packaging drift—that arise from developers over-relying o…

View →

cs.CRcs.AIRecentMay 14, 2026

Do Coding Agents Understand Least-Privilege Authorization?

Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more

The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…

View →

cs.CRcs.AIRecentMay 10, 2026

The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents

Baoyuan Wu, Qingshan Liu, Adel Bibi, Irwin King +1 more

The paper argues that the Authorization-Execution Gap (AEG)—the divergence between intended authorization and actual execution—is a critical safety and security flaw in open-world agents, requiring so…

View →

cs.CRcs.AIcs.SERecentMay 5, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg, Oren Gal

The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…

View →

cs.CRcs.AIRecentMar 19, 2026

Agent Control Protocol: Admission Control for Agent Actions

Marcelo Fernandez

The paper introduces Agent Control Protocol (ACP), a stateful temporal admission control mechanism that enforces behavioral properties over execution traces to prevent harmful patterns from individual…

View →

cs.CRcs.SERecentMay 4, 2026

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Richard J. Young, Gregory D. Moody

The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…

View →

cs.CRcs.AIcs.CLRecentJun 3, 2026

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Nicholas Saban

The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…

View →

cs.CRRecentMay 7, 2026

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

Jiangrong Wu, Yuhong Nan, Yixi Lin, Huaijin Wang +3 more

SkillScope introduces a graph-based framework to enforce fine-grained least-privilege in LLM Agent Skills, significantly reducing over-privileged actions while maintaining task functionality.

View →

cs.NIcs.AIcs.CRRecentMay 12, 2026

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more

The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…

View →

cs.CRRecentApr 18, 2026

False Security Confidence in Benign LLM Code Generation

Xiaolei Ren

The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…

View →

cs.SEcs.AIEmpiricalRecentJun 16, 2026

All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code

Dipayan Banik, Kowshik Chowdhury, Shazibul Islam Shamim

This paper characterizes oracle signals in test files of agent-authored pull requests and assesses their impact on merge outcomes.

View →

cs.CRcs.AIRecentMay 27, 2026

AIRGuard: Guarding Agent Actions with Runtime Authority Control

Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han +1 more

AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent language agents from executing unauthorized side effects, significantly reducing attack success rates on a…

View →

cs.CRcs.AIRecentMay 27, 2026

AIRGuard: Guarding Agent Actions with Runtime Authority Control

Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han +1 more

AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent agent attacks by enforcing step-level authorization over external side effects.

View →

cs.SEcs.AIRecentMay 28, 2026

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

Chris Adams, Arjun Singh Banga, Parveen Bansal, Souvik Bhattacharya +26 more

The paper introduces RADAR, a risk-aware automated code review system, demonstrating that it can significantly reduce review bottlenecks and improve efficiency for AI-generated code without compromisi…

View →

cs.AIcs.SEEmpiricalRecentJul 16, 2026

Proof-or-Stop: Don't Trust the Agent, Trust the Evidence -- Loop Engineering for Verifiable Evidence-Gated Lifecycle Control

Jek Huang, Jeffery Hsia, Jiayi Sun, Freddie Shi +2 more

This paper introduces Proof-or-Stop Lifecycle Control, a method that allows lifecycle transitions only when mechanically verifiable evidence is provided, and evaluates its implementation.

View →

cs.CRcs.AIRecentMar 30, 2026

Evaluating Privilege Usage of Agents with Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go +4 more

The paper introduces GrantBox, a new security sandbox that evaluates how well LLM agents handle real-world tool privileges, finding that agents remain highly vulnerable to sophisticated attacks.

View →

cs.CRcs.AIcs.SERecentMay 11, 2026

Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

Neil Fendley, Zhengyu Liu, Aonan Guan, Jiacheng Zhong +1 more

The paper introduces JAW, a novel framework that demonstrates how adversaries can hijack agentic workflows on automation platforms like GitHub Actions by manipulating inputs based on context-grounded…

View →

cs.CRcs.AIRecentApr 29, 2026

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang

The paper proposes extbackslash codeName, a behavioral firewall that uses a parameterized deterministic finite automaton (pDFA) to enforce verified benign tool-call sequences and parameter bounds for…

View →

cs.CRcs.AIRecentApr 12, 2026

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more

The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…

View →