Papers similar to 2604.20903v1

~ similar to 2604.20903v1· 20 results

cs.CLcs.AIcs.LGRecentJun 1, 2026

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

Ieva Raminta Staliūnaitė, James Bishop, Andreas Vlachos

This paper proposes a method to improve error prediction for LLMs by explicitly disentangling input ambiguity from standard Uncertainty Quantification signals, showing that ambiguity information signi…

View →

cs.AIRecentJun 1, 2026

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more

This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…

View →

cs.AIRecentMay 28, 2026

Harnessing non-adversarial robustness in large language models

Qinghua Zhou, Ellina Aleshina, Andrey Lovyagin, Oleg Somov +5 more

The paper proposes a debiasing fine-tuning technique to efficiently enhance the robustness of Large Language Models against semantically similar but textually altered prompts.

View →

cs.AIRecentMay 28, 2026

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Zhihao Liu, Yifan Wu, Jian Lou, Di Wang +2 more

The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining u…

View →

cs.AIRecentMay 27, 2026

Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values

Seongjun Lee, Suwan Yoon, Changhee Lee

The paper proposes Shapley-based input uncertainty Quantification (ShaQ), a novel framework that uses Shapley values to precisely attribute input-induced uncertainty to specific spans of text, providi…

View →

cs.LGcs.AIcs.CLEmpiricalRecentJul 3, 2026

Aligning Language Models with Selective Prediction

Gaoxiang Luo, Yifan Wu, Sinian Zhang, Aryan Deshwal +1 more

This paper proposes a method called selective prediction to enhance the reliability of large language models by allowing them to only predict for inputs where they are likely to be correct, reducing e…

View →

cs.CLcs.AIRecentMay 29, 2026

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more

This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…

View →

cs.CLRecentJun 1, 2026

On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective

Yikai Guo, Bin Wang, Xilai Fan, Wenjun Ke +1 more

The paper proposes 'Uncertainty,' a multiscale uncertainty estimator that focuses on low-probability tokens to improve the detection of AI-generated text by addressing boilerplate dominance and score…

View →

cs.LGcs.AIcs.CRRecentMay 11, 2026

Leveraging RAG for Training-Free Alignment of LLMs

John T. Halloran

The paper introduces RAG-Pref, a novel, training-free Retrieval Augmented Generation (RAG) method for preference alignment that significantly improves LLM refusal guardrails against agentic attacks wi…

View →

cs.LGcs.AIcs.CRRecentMay 6, 2026

Information Theoretic Adversarial Training of Large Language Models

Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia +2 more

The paper proposes WARDEN, a distributionally robust adversarial training framework that significantly reduces LLM vulnerability to adversarial attacks by dynamically reweighting hard adversarial exam…

View →

cs.CRcs.CLRecentApr 9, 2026

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen +5 more

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and def…

View →

cs.AIcs.CLcs.CRRecentApr 27, 2026

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

Hikmat Karimov, Rahid Zahid Alekberli

The paper proposes a novel information-geometric framework to analyze LLM stability by integrating task utility, external entropy, and internal structural proxies, showing this composite score improve…

View →

cs.CLcs.AIRecentJun 2, 2026

Quantifying Faithful Confidence Expression in Large Reasoning Models

Areeb Gani, Asal Meskin, Gabrielle Kaili-May Liu, Arman Cohan

The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…

View →

cs.CRcs.AIcs.LGRecentMay 24, 2026

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Tongxi Wu, Jian Zhang, Yang Gao

The paper challenges the assumption that LLM safety is a binary threshold, proposing that safety failures occur in an 'instability region' and introducing Furina, a transferable attack that exploits t…

View →

eess.ASEmpiricalRecentJul 8, 2026

UBG-Net: An Uncertainty-aware Bayesian Gating Network for Robust Audio-Visual Speech Recognition

Jinjie Fu, Hang Chen, Wu Guo, Zhijun Zhang +2 more

This paper proposes a framework, UBG-Net, for robust audio-visual speech recognition using a Modality Uncertainty-aware Bayesian Fusion mechanism and Distribution Uncertainty-aware Hierarchical Voting…

View →

cs.LGcs.CRRecentJun 2, 2026

When Autoregressive Consistency Hurts Safety Alignment

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

The paper argues that shallow safety alignment in LLMs is due to autoregressive consistency, a mechanism that allows small harmful inputs to redirect the model's generation to unsafe outputs, necessit…

View →

cs.LGcs.CVEmpiricalRecentJun 30, 2026

CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation

Sanghyuk Chun, William Yang, Amaya Dharmasiri, Olga Russakovsky

The paper proposes CoMet, a method for uncertainty estimation in multimodal large language models, which decomposes uncertainty into context-specific and multiplicity-specific terms.

View →

cs.AIcs.CRRecentMay 18, 2026

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more

This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…

View →

cs.LGcs.AIEmpiricalRecentJun 30, 2026

Evil Spectra: How Optimisers can Amplify or Suppress Emergent Misalignment

Jason R. Brown, Patrick Leask, Lev McKinney

This paper systematically characterises the sensitivity of emergent misalignment (EM) in LLMs to various training choices, finding that the choice of optimiser has the largest effect on misalignment r…

View →