Papers similar to 2605.29965

~ similar to 2605.29965· 19 results

cs.AIcs.LORecentMay 29, 2026

Answer-Set-Programming-based Abstractions for Reinforcement Learning

Rafael Bankosegger, Thomas Eiter, Johannes Oetsch

This paper proposes using Answer-Set Programming (ASP) to implement and evaluate CARCASS abstractions, demonstrating a promising method for constructing powerful abstractions for Reinforcement Learnin…

View →

cs.PLcs.AIRecentMay 29, 2026

SEMBridge: Tagless-Final Program Semantics with Weakest-Precondition and Bounded-Checking Interpretations

Eric Liang

SEMBridge is a tagless-final framework that allows a single executable object program to generate multiple program semantics, including weakest-precondition and bounded-checking interpretations, ensur…

View →

cs.AIRecentMay 28, 2026

Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)

João Filipe, Álvaro Torralba, Gregor Behnke

This paper investigates various methods for encoding factored tasks, a compact planning representation, into propositional logic for use with SAT solvers, analyzing the impact of encoding choices and…

View →

cs.AIRecentMay 28, 2026

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao, Yuan Yuan, Jing Liu

RACE-Sched is an asynchronous agentic framework that successfully integrates low-latency, real-time scheduling decisions with advanced, long-horizon reasoning provided by Large Language Models.

View →

cs.CLRecentMay 31, 2026

Robust Asynchronous Planning via Auto-Formalization

Jiayi Zhang, Jianing Yin, Ben Zhou, Li Zhang

The paper introduces new benchmarks for complex asynchronous planning and demonstrates that general constraint satisfaction formalizers (like CP-SAT) significantly outperform direct LLM planning or tr…

View →

cs.PLcs.CCcs.DBRecentJun 1, 2026

From Time to Space: The Impact of Linearity in Higher-Order Datalog

Angelos Charalambidis, Babis Kostopoulos, Panos Rondogiannis

The paper analyzes a fragment of Higher-Order Datalog, showing that restricting recursion to a linear form shifts its expressive power from time complexity to space complexity, specifically capturing…

View →

cs.CLRecentJun 1, 2026

CRAFTQA: A Code-Driven Adaptive Framework for Complex Structured Data Reasoning

Chengtao Gan, Zhiqiang Liu, Long Jin, Yushan Zhu +2 more

CRAFTQA introduces a novel adaptive, code-driven framework that significantly enhances complex structured data reasoning by dynamically generating custom code functions beyond predefined operations.

View →

cs.CLcs.AIRecentMay 31, 2026

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more

The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.AIcs.LORecentMay 28, 2026

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska, Guillem Alenyà, Felip Manyà

The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…

View →

cs.AIRecentMay 27, 2026

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Xujun Li, Kehan Zheng, Mingyuan Zhao, Yize Geng +6 more

The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from task execution traces, lead…

View →

cs.CLRecentMay 29, 2026

Mellum2 Technical Report

Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more

Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…

View →

cs.AIRecentMay 27, 2026

OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents

Chenyu Zhou, Xinyun Lu, Jiangyue Zhao, Jianghao Lin +2 more

The paper introduces OR-Space, a novel full-lifecycle workspace benchmark designed to rigorously evaluate industrial optimization agents by simulating real-world, multi-stage OR workflows that go beyo…

View →

cs.AIRecentMay 29, 2026

MAVEN: Improving Generalization in Agentic Tool Calling

Omkar Ghugarkar, Vishvesh Bhat, Muhammad Ahmed Mohsin, Asad Aali

The paper introduces MAVEN, a lightweight symbolic reasoning scaffold that significantly improves the generalization and end-to-end success rate of large language models in complex, multi-step tool-ca…

View →

cs.AIcs.CLRecentMay 28, 2026

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Lorenz Kutschka, Bernhard Geiger

This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…

View →

cs.AIcs.CLcs.LORecentMay 27, 2026

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Leizhen Zhang, Shuhan Chen, Sheng Chen

The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…

View →

cs.DBcs.AIEmpiricalRecentJun 10, 2026

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

Zhiyi Chen, Jie Song, Peng Li

The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.

View →

cs.CLRecentJun 1, 2026

HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

Mingju Chen, Can Lv, Guibin Zhang, Heng Chang +1 more

HarnessForge introduces a meta-adaptive framework that jointly evolves the execution structure (harness) and the reasoning policy of LLM agents, significantly improving overall system performance acro…

View →

cs.CRcs.AIcs.MARecentMay 1, 2026

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere

The paper proposes a trust schema and verification framework to ensure that agent skills, which augment LLMs, are rigorously verified before deployment, thereby making human-in-the-loop oversight scal…

View →