ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.00981· 18 results

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →
cs.AIRecentMay 28, 2026

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

Elliot Gestrin, Jendrik Seipp

This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.

View →
cs.AIRecentMay 28, 2026

Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)

João Filipe, Álvaro Torralba, Gregor Behnke

This paper investigates various methods for encoding factored tasks, a compact planning representation, into propositional logic for use with SAT solvers, analyzing the impact of encoding choices and…

View →
cs.AIRecentMay 28, 2026

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao, Yuan Yuan, Jing Liu

RACE-Sched is an asynchronous agentic framework that successfully integrates low-latency, real-time scheduling decisions with advanced, long-horizon reasoning provided by Large Language Models.

View →
cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →
cs.CRRecentApr 11, 2026

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

Guangyu Gong, Zizhuang Deng

PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…

View →
cs.AIcs.CRcs.SERecentMay 9, 2026

Containment Verification: AI Safety Guarantees Independent of Alignment

Royce Moon, Lav R. Varshney

The paper introduces containment verification, a novel method that provides safety guarantees by formally verifying the agentic framework itself, ensuring safety regardless of the underlying AI model'…

View →
cs.AIcs.CLcs.LORecentMay 27, 2026

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Leizhen Zhang, Shuhan Chen, Sheng Chen

The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…

View →
cs.AIRecentMay 29, 2026

MAVEN: Improving Generalization in Agentic Tool Calling

Omkar Ghugarkar, Vishvesh Bhat, Muhammad Ahmed Mohsin, Asad Aali

The paper introduces MAVEN, a lightweight symbolic reasoning scaffold that significantly improves the generalization and end-to-end success rate of large language models in complex, multi-step tool-ca…

View →
cs.AIRecentJun 1, 2026

LLM-Evolved Pattern Generators for Optimal Classical Planning

Windy Phung, Dominik Drexler, Arnaud Lequen, Jendrik Seipp

The paper introduces a novel LLM-driven evolutionary framework to synthesize admissible, domain-specific pattern generators, enabling optimal classical planning with high performance and interpretabil…

View →
cs.AIRecentMay 30, 2026

Efficient Test-time Inference for Generative Planning Models

Robert Gieselmann, Mihai Samson, Federico Pecora, Jeremy L. Wyatt

The paper proposes an efficient inference procedure for generative planning models by modifying the Open-Closed List (OCL) search, achieving superior performance over existing baselines.

View →
cs.ROcs.AIcs.CVEmpiricalRecentJun 10, 2026

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

Jadelynn Dao, Milan Ganai, Yasmina Abukhadra, Ajay Sridhar +6 more

This paper introduces DIRECT, a routing framework that allocates test-time compute per prompt to improve the success--cost Pareto frontier for embodied agents.

View →
cs.AIRecentJun 1, 2026

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…

View →
cs.AIRecentMay 28, 2026

Formalizing Mathematics at Scale

Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat +4 more

The paper introduces AutoformBot, a multi-agent system that successfully autoformalizes a large corpus of open-access graduate-level mathematics textbooks into a verified library in Lean 4, demonstrat…

View →
cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →
cs.AIRecentMay 29, 2026

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

Mustafa Anis Hussain, Xinle Wu, Yao Lu

The paper proposes DecomposeR, a planner-centric framework that structures deep research into typed Directed Acyclic Graphs (DAGs) to explicitly improve the planning and execution of large language mo…

View →
cs.LOcs.CRcs.FLRecentMar 20, 2026

Agentproof: Static Verification of Agent Workflow Graphs

Melwin Xavier, Vaisakh M A, Melveena Jolly, Midhun Xavier

Agentproof is a system that provides static, pre-deployment verification of safety properties in agent workflow graphs by automatically extracting a unified graph model and applying structural and tem…

View →
cs.AIcs.LORecentMay 28, 2026

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska, Guillem Alenyà, Felip Manyà

The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…

View →