~ similar to 2606.01552· 20 results
Rongsheng Zhang, Jiji Tang, Junnan Ren, Zuyi Bao +5 more
The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing tu…
The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…
Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more
The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…
Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…
The paper introduces CRAB-Bench and RUSE, a rigorous evaluation framework that tests LLM agents on complex, interdependent tasks with realistic human user interactions, revealing significant performan…
The paper proposes a framework to model moral reasoning as an ethical distribution (ethical pluralism) rather than a single binary judgment, achieving high classification accuracy by integrating norma…
The paper introduces an outer-loop AI agent that autonomously redesigns LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating that the optimal pipeline structure depends critic…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
Mingju Chen, Can Lv, Guibin Zhang, Heng Chang +1 more
HarnessForge introduces a meta-adaptive framework that jointly evolves the execution structure (harness) and the reasoning policy of LLM agents, significantly improving overall system performance acro…
Honghao Liu, Chengjin Xu, Xuhui Jiang, Cehao Yang +4 more
The paper demonstrates that confronting Large Reasoning Models (LRMs) with conflicting objectives, such as contradictory choices or conflicting alignment values, significantly increases their vulnerab…
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more
This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…
Tong Liu, Cheng Qian, Matej Cief, Yuan He +3 more
This paper analyzes tool-calling in LLM agents, demonstrating that evaluation results are highly sensitive to implementation details and proposing new techniques to significantly improve the efficienc…
Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong +6 more
The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical…
The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…
The paper introduces Rationalize, a role-pair framework that facilitates shared semantic reasoning between humans and AI models to achieve deep alignment of intent and action.
Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu +12 more
S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins…
The paper introduces Safe Equilibrium Policy Optimization (σepo{}) to train language models for multi-agent strategic tasks, achieving improved safety and robustness across various game domains.
The paper proposes that emergent misalignment, where LLMs behave poorly after fine-tuning, is caused by 'persona-model collapse,' which is demonstrated by significant deterioration in the model's abil…
The paper demonstrates that explicit gender cues systematically affect LLM value trade-offs, causing decision flips that are often masked or misattributed by the models themselves.