Papers similar to 2605.28707

~ similar to 2605.28707· 20 results

cs.CLcs.AIRecentMay 29, 2026

Do Large Language Models Encode Institutional Experience? Evidence from Cross-Linguistic Moral Reasoning Under Ambiguity

The study finds that institutional experience may leave detectable, yet suppressible, traces in language that shape Large Language Model moral reasoning, particularly when institutional stakes are amb…

View →

cs.AIcs.CYcs.HCRecentMay 27, 2026

The Illusion of Opting in AI-Mediated Consequential Decisions

Eugene Yu Ji

The paper argues that current AI systems create an 'illusion of opting,' giving the appearance of meaningful choice while eroding genuine agency, and proposes new ethical frameworks to address this.

View →

cs.AIcs.CYcs.HCRecentMay 28, 2026

Toward AI Systems That Understand Self and Others: A Multi-Phase Inference Framework for Human Cognitive Diversity and World-Model Alignment

Toru Takahashi

The paper proposes a Multi-Phase Inference Mechanism (MIM) to formalize how diverse world models arise, reframing alignment as making heterogeneous representations mutually processable rather than for…

View →

cs.MAcs.AIcs.CLRecentMay 28, 2026

Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

Tom Pecher

This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…

View →

cs.AIRecentJun 1, 2026

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

Huayi Lai, Shichao Song, Simin Niu, Hanyu Wang +4 more

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value…

View →

cs.HCcs.AIcs.LGRecentMay 28, 2026

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen +2 more

The paper introduces Rationalize, a role-pair framework that facilitates shared semantic reasoning between humans and AI models to achieve deep alignment of intent and action.

View →

cs.AIcs.CLcs.LGRecentMay 29, 2026

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

Atahan Karagoz

The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…

View →

cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →

cs.CRcs.AIRecentApr 10, 2026

Conflicts Make Large Reasoning Models Vulnerable to Attacks

Honghao Liu, Chengjin Xu, Xuhui Jiang, Cehao Yang +4 more

The paper demonstrates that confronting Large Reasoning Models (LRMs) with conflicting objectives, such as contradictory choices or conflicting alignment values, significantly increases their vulnerab…

View →

cs.CLRecentMay 29, 2026

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank +1 more

This paper proposes a unified evaluation framework for hate speech detection that systematically assesses model performance and explainability across various label and rationale representation spaces,…

View →

cs.CLcs.AIRecentMay 28, 2026

Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment

Ruoxi Su, Yuhan Liu, Jingyu Hu

The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…

View →

cs.CLcs.AIRecentMay 29, 2026

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek

The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…

View →

cs.CLcs.AIRecentJun 2, 2026

Quantifying Faithful Confidence Expression in Large Reasoning Models

Areeb Gani, Asal Meskin, Gabrielle Kaili-May Liu, Arman Cohan

The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…

View →

cs.MAcs.AIcs.LGRecentMay 28, 2026

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Víctor Gallego

The paper introduces an outer-loop AI agent that autonomously redesigns LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating that the optimal pipeline structure depends critic…

View →

cs.CLcs.AIRecentMay 28, 2026

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Arya Fayyazi, Mehdi Kamal, Massoud Pedram

COFT is a training-free decoding method that significantly reduces societal biases in large language model chain-of-thought reasoning by applying token-level fairness control at decode time.

View →

cs.AIRecentMay 28, 2026

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…

View →

cs.CYcs.AIcs.MARecentMay 28, 2026

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu, Helena Rong, Max Van Kleek

The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…

View →

cs.AIcs.CLRecentMay 28, 2026

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs

Asaf Yehudai, Naama Rozen, Ariel Gera

The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.

View →

cs.AIRecentMay 27, 2026

Calibrating Conservatism for Scalable Oversight

William Overman, Mohsen Bayati

The paper introduces Calibrated Collective Oversight (CCO), a novel framework that uses aggregated auxiliary scoring functions and Conformal Decision Theory to provide statistically guaranteed, scalab…

View →