Papers similar to 2605.29930

~ similar to 2605.29930· 20 results

cs.AIcs.CLcs.LGRecentMay 29, 2026

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…

View →

cs.MAcs.AIcs.CLRecentMay 28, 2026

Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

Tom Pecher

This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…

View →

cs.AIRecentMay 27, 2026

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Chen Ying Claude, Zhihan Luo

The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…

View →

cs.AIcs.LGRecentMay 28, 2026

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Yang Zhang, Xiukun Wei, Xueru Zhang

This paper analyzes multi-model self-consuming training, showing that while human curation helps individual models, cross-model interactions can degrade long-term alignment by dampening or inverting t…

View →

cs.AIcs.LGRecentMay 27, 2026

Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI

Aisha Aijaz, Rahul Goel, Arnav Batra, Raghava Mutharaju

The paper proposes a framework to model moral reasoning as an ethical distribution (ethical pluralism) rather than a single binary judgment, achieving high classification accuracy by integrating norma…

View →

cs.HCcs.AIcs.CLRecentMay 29, 2026

TUX: Measuring Human--AI Tacit Understanding

Yueshen Li, Hanyi Min, Vedant Das Swain, Koustuv Saha

The paper introduces the Tacit Understanding Index (TUX) to measure non-explicit alignment between humans and LLMs, finding that this alignment is significantly structured by individual person-level t…

View →

cs.LGcs.AIRecentMay 28, 2026

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Prakhar Dixit, Sadia Kamal, Tim Oates

The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…

View →

cs.AIcs.MARecentMay 29, 2026

MindZero: Learning Online Mental Reasoning With Zero Annotations

Shunchi Zhang, Jin Lu, Chuanyang Jin, Yichao Zhou +2 more

MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models (MLLMs) for efficient and robust online mental reasoning without requiring explicit…

View →

cs.AIcs.CLRecentMay 28, 2026

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs

Asaf Yehudai, Naama Rozen, Ariel Gera

The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.

View →

cs.AIRecentMay 28, 2026

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…

View →

cs.CLcs.AIcs.CYRecentMay 29, 2026

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Adrian de Wynter

The paper argues that purported anthropomorphic attributes of LLMs are not unique to language models but are substrate-dependent, demonstrating this by training a neural network on the game Age of Emp…

View →

cs.CLcs.AIRecentJun 2, 2026

Quantifying Faithful Confidence Expression in Large Reasoning Models

Areeb Gani, Asal Meskin, Gabrielle Kaili-May Liu, Arman Cohan

The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…

View →

cs.CYcs.AIcs.MARecentMay 28, 2026

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu, Helena Rong, Max Van Kleek

The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…

View →

cs.HCcs.AIcs.LGRecentMay 28, 2026

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen +2 more

The paper introduces Rationalize, a role-pair framework that facilitates shared semantic reasoning between humans and AI models to achieve deep alignment of intent and action.

View →

cs.CLRecentMay 29, 2026

Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task

Bart Evelo, Meaghan Fowlie, Denis Paperno

The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…

View →

cs.MAcs.AIcs.LGRecentMay 29, 2026

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Tomas Leroy-Stone

The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.

View →

cs.CLcs.AIRecentMay 29, 2026

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more

This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…

View →

cs.CLRecentMay 29, 2026

Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit

Jiwoo Choi, Seonwoo Ahn, Tongxin Zhang, Seohyon Jung

The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…

View →

cs.AIRecentMay 27, 2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…

View →