~ similar to 2605.28920· 19 results
The paper proposes sampling directly from approximations of an LLM posterior, conditioned on high-scoring regions, to generate more coherent and useful text compared to existing post-hoc hallucination…
Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more
This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…
Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more
The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…
The paper introduces Expected Value Alignment (EVA), a novel reward modeling procedure that allows continuous scoring of intermediate reasoning steps in formal mathematics verification while maintaini…
The paper introduces Drifting Preference Optimization (DrPO), an efficient online method for preference finetuning one-step text-to-image generators that avoids complex gradient calculations and model…
The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…
The paper proposes Sensitivity-Uncertainty Alignment (SUA), a framework that measures the misalignment between a model's prediction instability and its stated uncertainty to improve model reliability.
Bowen Wei, Nan Wang, Yuqing Zhou, Jinhao Pan +1 more
The paper proposes COSE, a method that uses an LLM's intrinsic confidence as an uncertainty signal to improve self-evolutionary training, achieving state-of-the-art performance on general reasoning an…
The paper proposes Personalized Federated Weighted Conformal Prediction (PFWCP), a novel framework that ensures statistically valid uncertainty quantification in multi-agent, heterogeneous settings wh…
The paper introduces functional entropy, a code-specific uncertainty quantification method, which successfully predicts functional correctness in LLM-generated code by replacing natural language seman…
Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more
The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…
The paper introduces CROP, a novel conformal procedure that provides rigorous statistical guarantees for certifying the longest safe prefix of a language model's reasoning trace, allowing for targeted…
GLiGuard introduces a compact, schema-conditioned bidirectional encoder that achieves state-of-the-art performance in LLM content moderation across multiple safety dimensions while drastically reducin…
This book provides a compact, derivation-oriented mathematical primer that connects major families of generative AI models, showing their underlying structural relationships.
Jiasheng Zheng, Boxi Cao, Boxi Yu, Yuzhong Zhang +5 more
The paper introduces Atomic Decomposition and Recombination (ADR), a novel framework that generates genuinely novel and challenging verifiable code tasks, significantly improving the scalability of Re…
The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…
The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…
This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…
Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more
The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…