20 results for “Error type prediction”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper proposes DeMix, a novel framework for simultaneously diagnosing erroneous samples and their error types in machine learning models.
This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…
This paper proposes a method to improve error prediction for LLMs by explicitly disentangling input ambiguity from standard Uncertainty Quantification signals, showing that ambiguity information signi…
The paper introduces prefix filters and an algorithm (Palla) to systematically learn and apply specific error patterns in Large Language Models, significantly improving constrained generation tasks li…
CB-SLICE is a novel concept-based method for discovering model error slices that leverages Concept Bottleneck Models (CBMs) to provide fine-grained, faithful explanations directly linked to the root c…
Xinle Deng, Ruobin Zhong, Hujin Peng, Xiaoben Lu +14 more
The paper introduces MemTrace, a framework that treats LLM memory pipelines as traceable graphs to systematically diagnose and automatically correct memory-related errors, boosting performance by up t…
Mikhail L. Arbuzov, Lee Mosbacker, Sisong Bei, Ziwei Dong +2 more
The paper reframes LLM reliability from an impossible universal problem to a manageable, local patch-based problem, showing that sufficient interventions can be found by focusing on recurring failure…
Jungyeul Park, Kyungtae Lim, Wonjun Oh, Benjamin Nguyen +3 more
This paper refines word-based grammatical error annotation for L2 Korean by adapting existing resources to better reflect Korean morphology and error types, improving the evaluation of Korean Grammati…
The paper introduces SafetyDrift, a predictive model that forecasts when AI agents will violate safety protocols by analyzing the cumulative risk across sequences of individually safe actions.
The paper introduces TalkTag, an LLM-based tool that automates fine-grained morphosyntactic error annotation for spoken-language transcripts, providing a scalable alternative to labor-intensive manual…
FPMoE introduces a sparse Mixture-of-Experts (MoE) architecture to improve functional code generation across multiple functional programming languages, achieving state-of-the-art performance with fewe…
This paper comparatively analyzes two automatic label error detection methods, Confident Learning and Dataset Cartography, demonstrating that targeted data filtering significantly improves model perfo…
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
The paper introduces SB-ECC, a novel score-based decoder that models error correction as continuous-time denoising, achieving state-of-the-art performance across various code families and noise levels…
Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li +7 more
The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an ag…
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
CSULoRA is a post-hoc method that corrects trained LoRA adapters by estimating a safety-aligned subspace and solving a penalized minimum-change problem to attenuate unsafe update directions while pres…
The paper investigates predictive multiplicity and arbitrariness in recidivism risk assessment, finding that similarly accurate models often exhibit high predictive agreement, and proposes a simple po…
Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar, Eray Tüzün +1 more
SERSEM introduces a selective entropy-weighted scoring framework to significantly improve Membership Inference Attacks (MIAs) against code LLMs by focusing on human-centric coding anomalies rather tha…