Papers similar to 2606.00145

~ similar to 2606.00145· 18 results

cs.CRRecentJun 2, 2026

Same Weights, Different Robot: A Deployment Safety View of VLA Policies

The paper identifies a 'deployment-safety gap' in Vision-Language-Action (VLA) policies, showing that identical model checkpoints can result in physically different and unsafe robot actions due to act…

View →

cs.CLRecentMay 29, 2026

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…

View →

cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →

cs.CLcs.CRRecentApr 1, 2026

One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety

Samee Arif, Naihao Deng, Zhijing Jin, Rada Mihalcea

The paper introduces Incremental Completion Decomposition (ICD), a novel jailbreak strategy that successfully bypasses LLM safety mechanisms by eliciting malicious content through a sequence of single…

View →

cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →

cs.SEcs.AIcs.CLRecentMay 28, 2026

REPOT: Recoverable Program-of-Thought via Checkpoint Repair

Parsa Mazaheri

The paper introduces RePoT, a method that significantly improves Program-of-Thought (PoT) planning by deterministically verifying the initial plan prefix and using a single LLM call to resume planning…

View →

cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →

cs.AIRecentMay 28, 2026

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more

The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…

View →

cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →

cs.CRRecentMay 2, 2026

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

Igor Santos-Grueiro

The paper identifies and measures a critical failure mode where LLM agents violate policies by losing or corrupting directive-bearing state during the process of assembling the decision context, and p…

View →

cs.ROcs.AIeess.SYRecentMay 30, 2026

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Haofan Cao, Zhaoyang Li, Zhichao You, Liang Guo +1 more

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

View →

cs.AIRecentJun 1, 2026

TERRA: Task-Embedded Reasoning and Representation Architecture for Cross-Domain Applications

Shayan Shokri

The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…

View →

cs.CLcs.AIRecentMay 30, 2026

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

Chishui Chen, Jiaye Lin, Te Sun, Junxi Wang +5 more

SelSkill introduces a dual-granularity preference learning framework that treats skill use as a 'skill-or-skip' decision, significantly improving agent performance and execution precision in complex a…

View →

cs.ROcs.AIRecentMay 31, 2026

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

Hung Mai, Bin Zhu, Tuan Do

The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…

View →

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.ROcs.AIcs.LGRecentJun 4, 2026

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou +4 more

This paper proposes a compact, explicit interface for humanoid robots that enables diverse manipulation skills and demonstrates its feasibility through natural-language-driven task roll-outs.

View →

cs.LOcs.AIcs.CRRecentApr 19, 2026

Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems

Marcelo Fernandez

The paper introduces the concept of the atomic decision boundary, proving that for autonomous systems to guarantee execution-time admissibility, the decision and the resulting state transition must oc…

View →

cs.CVcs.AIRecentMay 28, 2026

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

Mingjian Gao, Wenqiao Zhang, Yuqian Yuan, Yang Dai +8 more

VISUALTHINK-VLA introduces a visual intermediate-reasoning framework that guides action prediction using compact visual evidence, achieving high accuracy and significantly low latency for real-time Vi…

View →