Papers similar to 2603.28551v2

~ similar to 2603.28551v2· 20 results

cs.CRRecentMay 22, 2026

Security, Privacy, and Ethical Risks in OpenClaw

Yutong Jin, Zelin Zhang, Zhijin Lyu, Jianbing Ni

This paper analyzes the security, privacy, and ethical risks associated with OpenClaw, a locally executable AI agent system, concluding that these risks pose major barriers to its trustworthy deployme…

View →

cs.SEcs.AIcs.CRRecentMay 30, 2026

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more

The paper introduces SkillReact, a framework that measures compositional risk in agent skill ecosystems, finding that even if individual skills are safe, their combination can create significant, unad…

View →

cs.SEcs.AIcs.CRRecentMay 30, 2026

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more

View →

cs.CRcs.AIcs.CLRecentMay 12, 2026

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more

The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…

View →

cs.CRcs.AIRecentMay 30, 2026

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more

The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior in open agentic skill ecosystems, significantly outperforming existing static a…

View →

cs.CRcs.AIRecentMay 30, 2026

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more

The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior hidden within open agentic skills, significantly outperforming static and seman…

View →

cs.CRcs.AIRecentMar 22, 2026

When Convenience Becomes Risk: A Semantic View of Under-Specification in Host-Acting Agents

Di Lu, Yongzhi Liao, Xutong Mu, Lele Zheng +4 more

The paper identifies that the convenience of host-acting agents leads to semantic under-specification in user goals, which forces the agent to generate potentially risky execution plans.

View →

cs.CRcs.AIRecentApr 7, 2026

Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw

Jan Gruber, Jan-Niclas Hilgert

This paper investigates the forensic analysis of agentic AI systems using OpenClaw, proposing an agent artifact taxonomy and highlighting the challenges posed by non-determinism in agent-mediated exec…

View →

cs.CRcs.CLRecentMay 31, 2026

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding +12 more

BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and real…

View →

cs.CRcs.CLRecentMay 31, 2026

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Yunhao Feng, Yifan Ding, Xiaohu Du, Ming Wen +12 more

BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.

View →

cs.CRcs.AIRecentJun 3, 2026

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

View →

cs.CLRecentJun 1, 2026

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou +7 more

The paper introduces SkillHarm, a comprehensive benchmark and automated framework for evaluating skill-based attacks across the entire agent skill-use lifecycle, demonstrating that current agents rema…

View →

cs.CRcs.AIRecentApr 12, 2026

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more

The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…

View →

cs.CRcs.AIRecentApr 3, 2026

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu +3 more

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more seve…

View →

cs.CRcs.AIRecentApr 30, 2026

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated rese…

View →

cs.CRcs.AIRecentMay 13, 2026

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Haomin Zhuang, Hanwen Xing, Yujun Zhou, Yuchen Ma +4 more

The paper introduces AgentTrap, a dynamic benchmark that measures LLM agent susceptibility to malicious side effects embedded within seemingly benign third-party skills, finding that agents often exec…

View →

cs.CRRecentMay 15, 2026

From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI

Zelin Zhang, Qi Li, Jie Cao, Lingshuang Liu +1 more

The paper analyzes the escalating security and safety threats posed by generative AI systems as they transition from merely generating content to executing real-world actions via tools and agents, fin…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRcs.AIRecentMay 11, 2026

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

Hongwei Yao, Yiming Liu, Yiling He, Bingrun Yang

The paper introduces DeepTrap, an automated framework that evaluates security vulnerabilities in agentic language models by manipulating their internal execution contexts, demonstrating that task comp…

View →

cs.CRcs.SERecentMar 22, 2026

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin +2 more

The paper proposes SkillProbe, a multi-agent security auditing framework, demonstrating that high-popularity skills in LLM agent marketplaces are often insecure due to systemic combinatorial risks.

View →