Papers similar to 2605.29534

~ similar to 2605.29534· 20 results

cs.CRcs.AIRecentMar 24, 2026

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

Yutao Luo, Haotian Zhu, Shuchao Pang, Zhigang Lu +3 more

The paper introduces AgentRAE, a novel backdoor attack that successfully forces mobile GUI agents to execute remote actions using visually natural triggers found in system notifications, achieving hig…

View →

cs.CLRecentMay 29, 2026

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +7 more

ExpGraph is a model-agnostic framework that uses a self-evolving experience graph to enable LLM agents to reuse past successful strategies and failure lessons, significantly improving performance acro…

View →

cs.CRcs.CLRecentMay 27, 2026

MaskClaw: Edge-Side Personalized Privacy Arbitration for GUI Agents with Behavior-Driven Skill Evolution

Yanqiu Zhao, Dongying Zheng, Kaibo Huang, Yukun Wei +2 more

MaskClaw is an edge-side privacy arbitrator that protects sensitive data in GUI agent screenshots by combining local visual evidence, task-specific policies, and a skill-evolution mechanism.

View →

cs.AIRecentMay 29, 2026

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu +5 more

The paper proposes SCALE, a self-improving web agent framework that uses adversarial roles and graph exploration to autonomously discover agent limitations and enhance adaptability in complex web envi…

View →

cs.LGcs.AIcs.CLRecentJun 1, 2026

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…

View →

cs.LOcs.CRcs.FLRecentMar 20, 2026

Agentproof: Static Verification of Agent Workflow Graphs

Melwin Xavier, Vaisakh M A, Melveena Jolly, Midhun Xavier

Agentproof is a system that provides static, pre-deployment verification of safety properties in agent workflow graphs by automatically extracting a unified graph model and applying structural and tem…

View →

cs.AIcs.CLRecentMay 27, 2026

Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution

Susanna Cifani, Mario Luca Bernardi, Marta Cimitile

The paper proposes a novel multimodal multi-agent framework that uses a topological knowledge graph to enable robust, adaptive automatic workflow execution, overcoming the limitations of treating task…

View →

cs.CLRecentJun 1, 2026

CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation

Danqing Wang, Akshay Sivaraman, Lei Li

The paper introduces CRAB-Bench and RUSE, a rigorous evaluation framework that tests LLM agents on complex, interdependent tasks with realistic human user interactions, revealing significant performan…

View →

cs.SEcs.AIRecentMay 28, 2026

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing

Xiaoyi Chen, Yifei Gao, Yang Xu, Xingxing Song +2 more

The paper introduces GUITestScape, a comprehensive benchmark for exploratory GUI testing, and GUIJudge, an open-set evaluator that significantly improves the assessment of AI agents' defect detection…

View →

cs.CLRecentMay 29, 2026

EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents

Dongwook Choi, Taeyoon Kwon, Bogyung Jeong, Minju Kim +5 more

EMBGuard introduces a novel, MLLM-based safety guardrail that explicitly identifies and explains physical hazards from (visual observation, action) pairs, enabling safer planning for embodied agents.

View →

cs.CLRecentJun 1, 2026

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim +11 more

The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 problems grounded in Korean contexts, demonstrating that current frontier LLMs struggle significantly with complex, context…

View →

cs.CRcs.AIcs.CLRecentMay 14, 2026

Web Agents Should Adopt the Plan-Then-Execute Paradigm

Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu +4 more

The paper argues that web agents should abandon the reactive ReAct paradigm in favor of a plan-then-execute approach, which requires developing typed, task-level APIs to properly structure web interac…

View →

cs.LGcs.AIcs.CRRecentJun 2, 2026

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more

The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.

View →

cs.CLRecentJun 1, 2026

Unified Context Evolution for LLM Agents

Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more

The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…

View →

cs.AIcs.CRRecentMar 24, 2026

AgentWall: A Runtime Safety Layer for Local AI Agents

Ashwin Aravind

AgentWall is a runtime safety layer that intercepts and evaluates all proposed actions from local AI agents against a declarative policy, ensuring safety before execution.

View →

cs.AIRecentMay 29, 2026

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

Junjie Nian, Kang Chen, Ge Zhang, Yixin Cao +1 more

TraceGraph introduces a graph-based framework to map agent decision-making across pooled trajectories, revealing hidden differences in agent behavior and improving performance by targeting known failu…

View →

cs.CRcs.LGRecentApr 25, 2026

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

Kexin Chu

The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…

View →

cs.AIcs.CRRecentApr 13, 2026

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

Zhixin Lin, Jungang Li, Dongliang Xu, Shidong Pan +4 more

The paper proposes Trajectory Induced Preference Optimization (TIPO) to improve mobile GUI agent personalization by explicitly modeling and optimizing for privacy-related behavioral differences in exe…

View →

cs.CLcs.AIRecentMay 27, 2026

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Xiaohongshu Inc

The paper introduces VibeSearchBench, a new benchmark designed to evaluate long-horizon, proactive search capabilities, demonstrating that current state-of-the-art LLM agents are still significantly i…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →