20 results for “Missingness mechanisms”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more
The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more
This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.
The paper formalizes the concept of a causal pathway for rare events, showing that testable implications can be derived solely from this pathway abstraction, simplifying complex causal modeling.
The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…
The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…
This paper diagnoses a bias-dominated shortcut in class-level machine unlearning, where forgetting is achieved by suppressing classification head biases, and proposes bias-aware mechanisms to mitigate…
The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artif…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang +3 more
The paper analyzes observation masking in long-horizon search agents, finding that its effectiveness depends on a complex interaction between the model's capacity and the retriever's strength, exhibit…
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…
Xiaosong Han, Ke Chen, Xindi Dai, Di Liang +6 more
TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.
Mikhail L. Arbuzov, Lee Mosbacker, Sisong Bei, Ziwei Dong +2 more
The paper reframes LLM reliability from an impossible universal problem to a manageable, local patch-based problem, showing that sufficient interventions can be found by focusing on recurring failure…
The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…
The paper introduces the Sequential Triply Robust (STR) estimator, a method that corrects for multiple systematic biases (authorization, reporting, delay, and corruption) in chargeback labels to achie…
The paper introduces Score Broadcast and Decorrelation (SBD), a general theoretical framework that unifies broadcast-based credit assignment across various differentiable loss functions by leveraging…
The paper introduces Inconsistency-Aware Minimization (IAM), a novel training objective that uses a label-free measure called local inconsistency to improve model generalization, particularly in semi-…
The paper analyzes backtracking dynamics in long reasoning traces to distinguish between useful self-correction and unproductive revision, finding that correct reasoning exhibits early, isolated repai…