~ similar to 2606.01845· 20 results
Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh +1 more
The paper introduces FRANZ, a communicative audit framework, to evaluate how LLMs frame responses to subjective questions, finding that LLMs exhibit statistically significant and coupled differences i…
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…
Daniel Arnould, Rashad Aziz, Zixuan Kang, Tanav Changal +4 more
CA-BED is a novel framework that improves LLM performance in interactive question-answering by integrating Bayesian Experimental Design to strategically select questions that maximize information gain…
The study finds that institutional experience may leave detectable, yet suppressible, traces in language that shape Large Language Model moral reasoning, particularly when institutional stakes are amb…
This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…
The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…
This paper investigates if team-based interaction improves LLM performance on complex reasoning tasks (ChGK), finding that structured team strategies significantly boost accuracy by acting as error-fi…
The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
The paper proposes using question-asking as an inference-time intervention to probe a language model's hidden state, finding that the self-diagnosis process provides a predictive signal for final corr…
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more
The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…
The paper compares verbalized feature attributions and self-generated rationales for explaining model behavior, finding that the format and granularity of the explanation significantly affect its abil…
The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…
Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more
This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…
The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…
This study demonstrates that the tone of a prompt significantly affects the accuracy of various LLMs, requiring users to exercise caution regarding tone-robust reliability.