Papers similar to 2606.00674

~ similar to 2606.00674· 20 results

cs.AIcs.CLcs.LGRecentMay 29, 2026

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…

View →

cs.CLcs.IRRecentJun 3, 2026

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

Zhenyu Yu, Shuigeng Zhou

This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.

View →

cs.AIRecentMay 27, 2026

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more

The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…

View →

cs.AIcs.LGstat.MLRecentMay 31, 2026

Transferring Information Across Interventions in Causal Bayesian Optimization

Mohammad Ali Javidian

The paper proposes graph-coupled causal Bayesian optimization, a method that improves efficiency by sharing information across related interventions through a shared set of causal parameters.

View →

cs.AIRecentJun 1, 2026

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…

View →

cs.CLcs.AIRecentMay 28, 2026

Predicting Causal Effects from Natural Language Queries using Structured Representations

Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier +6 more

The paper introduces Query2Effect, a large-scale benchmark, and a two-step framework to predict causal effect sizes from natural language queries, showing that structured representation significantly…

View →

cs.LGcs.AIcs.CLRecentJun 3, 2026

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more

This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.

View →

cs.CLcs.LGRecentMay 31, 2026

Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention

Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma +4 more

This paper introduces interpretability-guided, training-free interventions that systematically improve the accuracy and controllability of latent reasoning in LLMs by leveraging structural and causal…

View →

cs.AIcs.CLRecentMay 27, 2026

Revealing Algorithmic Deductive Circuits for Logical Reasoning

Phuong Minh Nguyen, Tien Huu Dang, Naoya Inoue

This paper localizes the attention heads within LLMs responsible for specific reasoning steps, finding that specialized heads handle factual retrieval while higher layers manage global information int…

View →

cs.MAcs.AIRecentMay 28, 2026

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more

The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…

View →

cs.AIRecentMay 31, 2026

Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

Shihao Ji, Haotao Tan, Zihui Song, Mingyu Li

The paper introduces Expected Value Alignment (EVA), a novel reward modeling procedure that allows continuous scoring of intermediate reasoning steps in formal mathematics verification while maintaini…

View →

cs.LGcs.AIcs.CLRecentJun 3, 2026

Reinforcement Learning from Rich Feedback with Distributional DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.

View →

cs.AIRecentMay 29, 2026

Distilling LLM Feedback for Lean Theorem Proving

Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more

The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…

View →

cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →

cs.AIcs.CLcs.LGRecentMay 28, 2026

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang +5 more

The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves be…

View →

cs.AIRecentMay 27, 2026

From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

Shuaike Li, Kai Zhang, Xianquan Wang, Jiachen Liu +1 more

The paper introduces Causal Editing (CODE), a new paradigm that improves knowledge updates in LLMs by grounding fact injection in causal narratives, drastically reducing self-refutation rates.

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.LGstat.MLRecentJun 4, 2026

Causal Atlases from Entropic Inference: Bayesian Networks beyond Optimal DAGs

Hazhir Aliahmadi, Irina Babayan, Greg van Anders

This paper introduces an entropy-based method to generate multiple plausible causal maps (atlases) that accurately reflect the inherent structural ambiguity in complex systems, moving beyond single, o…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →