20 results for “Understanding of computer automation, GUI agents, visual memory”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Seoyoung Choi, Minseok Ko, Hyunseok Lee, Kunwoong Kim +3 more
This paper introduces a taxonomy of GUI agent failures and finds that full-image memory has divergent effects on failure distribution. It proposes Action-Grounded Visual Memory (AGMem) as an effective…
Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang +5 more
The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little co…
Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more
This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
This paper investigates the forensic analysis of agentic AI systems using OpenClaw, proposing an agent artifact taxonomy and highlighting the challenges posed by non-determinism in agent-mediated exec…
Sina Mavali, David Pape, Jonathan Evertz, Samira Abedini +4 more
The paper introduces the Task Alignment Benchmark (TAB) to evaluate terminal agents' ability to selectively follow relevant environmental instructions while ignoring misleading distractors, revealing…
The paper introduces Parallax, an architectural framework that structurally separates AI reasoning from action execution to ensure robust safety for autonomous agents, achieving high attack mitigation…
The paper introduces FP-Agent, a classifier that demonstrates that while browser fingerprints are poor discriminators for AI browsing agents, behavioral fingerprints (like typing and scrolling pattern…
The paper proposes Multi-Agent Computer Use (MACU) systems, which significantly improve performance on complex, long-horizon tasks by enabling parallel execution and dynamic task decomposition compare…
Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo +6 more
The paper introduces Semantic-level UI Element Injection, a novel red-teaming technique that overlays misleading UI elements onto screenshots to significantly improve the attack success rate against s…
The paper introduces MemCog, a Memory-as-Cognition system that integrates memory access directly into the reasoning process, significantly improving agent performance, especially in proactive memory r…
Yanqiu Zhao, Dongying Zheng, Kaibo Huang, Yukun Wei +2 more
MaskClaw is an edge-side privacy arbitrator that protects sensitive data in GUI agent screenshots by combining local visual evidence, task-specific policies, and a skill-evolution mechanism.
This paper introduces 'Visual Inception,' a novel attack that poisons long-term memory in agentic recommender systems using images, and proposes CognitiveGuard, a dual-process defense framework to mit…
Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.
Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng +4 more
The paper identifies that background 'heartbeat' execution in personal AI agents like Claw can silently pollute the agent's memory with external misinformation, influencing user behavior without the u…
Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim +11 more
The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 problems grounded in Korean contexts, demonstrating that current frontier LLMs struggle significantly with complex, context…
AgentWall is a runtime safety layer that intercepts and evaluates all proposed actions from local AI agents against a declarative policy, ensuring safety before execution.
Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully…
Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully…
Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…