~ similar to 2605.31433· 20 results
Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more
The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…
Rongzhi Zhang, Rui Feng, Zhihan Zhang, Jingfeng Yang +7 more
QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in…
Wangyi Mei, Zhouhong Gu, Zhenhan Bai, Yin Cai +8 more
The paper proposes Deep Research as Rubric (DR-rubric), a novel evidence-driven framework that treats rubric construction itself as a research problem to generate fine-grained, scalable reward signals…
Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more
The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…
Xiaobo Wang, Tong Wu, Min Tang, Jiaqi Li +2 more
The paper introduces SAVE, a framework that uses on-policy feedback and the value function to self-supervise and improve reward models, significantly enhancing RLHF performance across multiple benchma…
Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao +9 more
BenchEvolver introduces a solution-centric evolutionary framework to automatically transform saturated coding benchmarks into significantly harder, high-quality, and diverse evaluation suites.
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Tao Feng, Tianyang Luo, Jingjun Xu, Zhigang Hua +4 more
ExpWeaver introduces a novel framework for LLM agents to learn from past experiences using latent retrieval-augmented generation, achieving state-of-the-art performance while significantly improving t…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
GRASP introduces a gated, regression-aware framework for improving LLM agents by ensuring that every proposed skill edit improves performance on a balanced probe without degrading previously learned c…
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
COMAP introduces a novel co-evolutionary framework that simultaneously updates textual world models and agent policies through closed-loop interaction, significantly improving long-horizon decision-ma…
Qiming Shi, Zhaolu Kang, Yunfan Zhou, Di Weng +1 more
SPADER is a novel reinforcement learning framework that addresses the challenges of Multi-Answer Question Answering by improving credit assignment and promoting diverse exploration during long-horizon…
Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +7 more
ExpGraph is a model-agnostic framework that uses a self-evolving experience graph to enable LLM agents to reuse past successful strategies and failure lessons, significantly improving performance acro…
LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…
Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao +10 more
MLEvolve is a novel self-evolving multi-agent framework that enables LLM agents to discover and optimize machine learning algorithms for complex, long-horizon tasks.
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
The paper introduces OCC-RAG, a family of compact, task-specialized Small Language Models (SLMs) designed to achieve highly faithful, multi-hop question answering grounded strictly in provided context…
The paper proposes an unsupervised Reinforcement Learning approach that enforces cross-lingual self-consistency to significantly enhance the multilingual reasoning capabilities of large language model…
Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai +1 more
DRIFT proposes a novel framework that efficiently optimizes LLMs for multi-turn interactions by decoupling rollout from optimization, allowing the use of weighted supervised fine-tuning to match the p…