~ similar to 2605.29930· 20 results
The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…
This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
This paper analyzes multi-model self-consuming training, showing that while human curation helps individual models, cross-model interactions can degrade long-term alignment by dampening or inverting t…
The paper proposes a framework to model moral reasoning as an ethical distribution (ethical pluralism) rather than a single binary judgment, achieving high classification accuracy by integrating norma…
The paper introduces the Tacit Understanding Index (TUX) to measure non-explicit alignment between humans and LLMs, finding that this alignment is significantly structured by individual person-level t…
The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…
Shunchi Zhang, Jin Lu, Chuanyang Jin, Yichao Zhou +2 more
MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models (MLLMs) for efficient and robust online mental reasoning without requiring explicit…
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…
The paper argues that purported anthropomorphic attributes of LLMs are not unique to language models but are substrate-dependent, demonstrating this by training a neural network on the game Age of Emp…
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…
The paper introduces Rationalize, a role-pair framework that facilitates shared semantic reasoning between humans and AI models to achieve deep alignment of intent and action.
The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…
The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.
Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more
This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…
The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…
Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more
The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…