~ similar to 2606.12386· 20 results
Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more
The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.
The paper proposes an agentic pipeline for spatial reasoning by introducing a dynamic cognitive map and Spatial Assertion Codes (SAC), achieving state-of-the-art performance on complex spatial tasks.
The paper proposes a category-theoretic framework for agentic AI that models scientific discovery not as answer generation, but as a verifiable transition and revision of the underlying representation…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.
Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more
The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…
Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more
This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more
The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…
MolLingo is a multi-agent system that significantly improves automated molecular design by integrating domain-specific chemical reasoning and structural context into LLMs, outperforming state-of-the-a…
The paper introduces 'probe trajectories'—a continuous measure of a concept's probability across a model's reasoning process—to improve the monitoring of Large Reasoning Models' future behavior, showi…
Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more
The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…
The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…
Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu +5 more
The paper proposes SCALE, a self-improving web agent framework that uses adversarial roles and graph exploration to autonomously discover agent limitations and enhance adaptability in complex web envi…
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.
This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…