~ similar to 2605.30675· 20 results
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
The paper proposes Shapley-based input uncertainty Quantification (ShaQ), a novel framework that uses Shapley values to precisely attribute input-induced uncertainty to specific spans of text, providi…
The paper proposes Sensitivity-Uncertainty Alignment (SUA), a framework that measures the misalignment between a model's prediction instability and its stated uncertainty to improve model reliability.
This paper proposes a method to improve error prediction for LLMs by explicitly disentangling input ambiguity from standard Uncertainty Quantification signals, showing that ambiguity information signi…
The paper introduces functional entropy, a code-specific uncertainty quantification method, which successfully predicts functional correctness in LLM-generated code by replacing natural language seman…
Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more
This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…
Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more
This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
The paper introduces a diagnostic framework to decompose multilingual LLM performance variance, showing that language identity and model-benchmark interactions are key drivers of performance gaps.
Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more
This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper proposes sampling directly from approximations of an LLM posterior, conditioned on high-scoring regions, to generate more coherent and useful text compared to existing post-hoc hallucination…
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
The paper introduces MENTIS, a geometry-first framework that measures how preference alignment structurally changes the internal computations of language models, finding that these changes are selecti…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…
The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…
The study finds that for a relational intervention to successfully restore a language model's behavior after functional collapse, both a relational structure (e.g., acknowledgment) and a first-person…
The paper proposes 'Uncertainty,' a multiscale uncertainty estimator that focuses on low-probability tokens to improve the detection of AI-generated text by addressing boilerplate dominance and score…