~ similar to 2606.00674· 20 results
The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more
The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…
The paper proposes graph-coupled causal Bayesian optimization, a method that improves efficiency by sharing information across related interventions through a shared set of causal parameters.
Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more
The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…
Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier +6 more
The paper introduces Query2Effect, a large-scale benchmark, and a two-step framework to predict causal effect sizes from natural language queries, showing that structured representation significantly…
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more
This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.
Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma +4 more
This paper introduces interpretability-guided, training-free interventions that systematically improve the accuracy and controllability of latent reasoning in LLMs by leveraging structural and causal…
This paper localizes the attention heads within LLMs responsible for specific reasoning steps, finding that specialized heads handle factual retrieval while higher layers manage global information int…
Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more
The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…
The paper introduces Expected Value Alignment (EVA), a novel reward modeling procedure that allows continuous scoring of intermediate reasoning steps in formal mathematics verification while maintaini…
This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang +5 more
The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves be…
Shuaike Li, Kai Zhang, Xianquan Wang, Jiachen Liu +1 more
The paper introduces Causal Editing (CODE), a new paradigm that improves knowledge updates in LLMs by grounding fact injection in causal narratives, drastically reducing self-refutation rates.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
This paper introduces an entropy-based method to generate multiple plausible causal maps (atlases) that accurately reflect the inherent structural ambiguity in complex systems, moving beyond single, o…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…