~ similar to 2604.24076v1· 19 results
The paper proposes Shapley-based input uncertainty Quantification (ShaQ), a novel framework that uses Shapley values to precisely attribute input-induced uncertainty to specific spans of text, providi…
The paper proposes Sensitivity-Uncertainty Alignment (SUA), a framework that measures the misalignment between a model's prediction instability and its stated uncertainty to improve model reliability.
The paper challenges the assumption that LLM safety is a binary threshold, proposing that safety failures occur in an 'instability region' and introducing Furina, a transferable attack that exploits t…
Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more
This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…
Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more
This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…
Weiwei Qi, Zefeng Wu, Tianhang Zheng, Zikang Zhang +3 more
The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment…
The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…
The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
The paper introduces Canopy Entropy ($ ext{CE}^ ext{*}$), a novel metric that quantifies generation uncertainty across the entire output space, demonstrating that fine-tuning improves information conv…
The paper demonstrates that standard LLM evaluation metrics overestimate performance because they fail to account for the stability of outcomes, showing a significant gap between reported pass rates a…
This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.
This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.
This paper addresses the critical need for trustworthy LLMs in science by proposing a comprehensive, multi-layered defense framework and methodology to evaluate unique scientific vulnerabilities.
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more
This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…
Wanying Ren, Xin Song, Futing Wang, Guoxiu He +1 more
The paper theoretically analyzes the limitations of parameter-based knowledge editing and empirically demonstrates that these methods consistently damage core LLM capabilities compared to retrieval-ba…