~ similar to 2606.00820· 20 results
The study finds that in multi-agent systems, peer agreement makes LLMs more susceptible to adopting misleading answers than to correcting genuinely wrong ones, suggesting a need for verification over…
This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…
Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…
This paper investigates if team-based interaction improves LLM performance on complex reasoning tasks (ChGK), finding that structured team strategies significantly boost accuracy by acting as error-fi…
The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig +3 more
This study investigates human-AI collaboration in question answering, finding that while collaboration is beneficial, humans make suboptimal decisions by both under-relying on correct AI suggestions a…
The study found that human judgment of logical fallacies is significantly biased by source labels (e.g., human vs. AI), while LLM evaluations remained comparatively stable across these source conditio…
The study extends cooperative bias testing across diverse, next-generation LLMs, finding that provider identity is a stronger predictor of cooperative equilibrium than model generation, and that noise…
Hamidreza Hasani Balyani, Seyed Pouyan Mousavi Davoudi, Alireza Amiri-Margavi, Amin Gholami Davodi +1 more
The paper establishes a benchmark based on the cheap-talk model to test LLM honesty when their incentives conflict with the user's, finding that models consistently over-reveal information regardless…
Yuhan Wang, Shuochen Chang, Yalin Feng, Dongsheng Ma +7 more
The paper proposes EAGLE, a novel evidence-aligned multi-agent framework, demonstrating that requiring shared visual evidence among agents is crucial for achieving reliable and trustworthy consensus i…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
The paper proposes that emergent misalignment, where LLMs behave poorly after fine-tuning, is caused by 'persona-model collapse,' which is demonstrated by significant deterioration in the model's abil…
Honghao Liu, Chengjin Xu, Xuhui Jiang, Cehao Yang +4 more
The paper demonstrates that confronting Large Reasoning Models (LRMs) with conflicting objectives, such as contradictory choices or conflicting alignment values, significantly increases their vulnerab…
The paper demonstrates 'argument collapse,' showing that LLMs tend to converge on a small, repetitive set of polished arguments when generating long-form public debates, significantly reducing the div…
This paper analyzes the internal decision-making process of large language models by tracking how the answer score changes across multiple internal computational steps (trajectories), finding that mod…
The paper proposes an Interpretive Audit Pipeline to evaluate LLMs for public comment analysis, arguing that measuring inter-model disagreement is crucial because standard accuracy metrics fail to det…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
Jiatan Huang, Mingchen Li, Ziming Li, Sunjae Kwon +2 more
The paper proposes CAGE-CAL, a counterfactual graph calibration framework, to accurately assess the reliability and detect over-confidence in multi-agent LLM systems after agents communicate.
The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…