~ similar to 2606.00981· 18 results
This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.
This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.
This paper investigates various methods for encoding factored tasks, a compact planning representation, into propositional logic for use with SAT solvers, analyzing the impact of encoding choices and…
RACE-Sched is an asynchronous agentic framework that successfully integrates low-latency, real-time scheduling decisions with advanced, long-horizon reasoning provided by Large Language Models.
Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more
The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
The paper introduces containment verification, a novel method that provides safety guarantees by formally verifying the agentic framework itself, ensuring safety regardless of the underlying AI model'…
The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…
The paper introduces MAVEN, a lightweight symbolic reasoning scaffold that significantly improves the generalization and end-to-end success rate of large language models in complex, multi-step tool-ca…
The paper introduces a novel LLM-driven evolutionary framework to synthesize admissible, domain-specific pattern generators, enabling optimal classical planning with high performance and interpretabil…
The paper proposes an efficient inference procedure for generative planning models by modifying the Open-Closed List (OCL) search, achieving superior performance over existing baselines.
Jadelynn Dao, Milan Ganai, Yasmina Abukhadra, Ajay Sridhar +6 more
This paper introduces DIRECT, a routing framework that allocates test-time compute per prompt to improve the success--cost Pareto frontier for embodied agents.
Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more
The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…
Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat +4 more
The paper introduces AutoformBot, a multi-agent system that successfully autoformalizes a large corpus of open-access graduate-level mathematics textbooks into a verified library in Lean 4, demonstrat…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
The paper proposes DecomposeR, a planner-centric framework that structures deep research into typed Directed Acyclic Graphs (DAGs) to explicitly improve the planning and execution of large language mo…
Agentproof is a system that provides static, pre-deployment verification of safety properties in agent workflow graphs by automatically extracting a unified graph model and applying structural and tem…
The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…