20 results for “LLM training”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.
Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…
The paper introduces Synthesis Data Reversion (SDR), a method that infers the data laundering transformation used in LLM training and synthesizes queries to restore the detection signals lost when pro…
This paper proposes using offline reinforcement learning (RL) as an efficient alternative to online RL for post-training code-generating LLMs, demonstrating its effectiveness, especially for smaller m…
Mohammed Kharma, Ahmed Sabbah, Radi Jarrar, Samer Zain +2 more
The study found that providing developers with a layer-based security training package significantly reduces the number and severity of security vulnerabilities in LLM-assisted web application develop…
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…
The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…
The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…
The paper introduces an LLM-based pipeline that tags learning resources with structured competencies, achieving strong performance while providing traceable evidence and leveraging graph constraints.
This study compares different levels of LLM access in a statistics course, finding that structured, guided use significantly improves students' reasoning skills and independent learning compared to un…
The paper introduces a novel, non-deep neural network architecture that achieves the performance of LLMs by finding the global optimum of the loss function in a single, closed-form iteration, eliminat…
This paper investigates the 'faithfulness gap' in LLM agents—the discrepancy between stated reasoning and actual action—by decomposing it into two opposing steps: reasoning-to-conclusion and conclusio…
Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…
The paper evaluates an automated legal triage system (FETCH) that uses follow-up questions, demonstrating that while low-cost LLMs are effective for classification, generating high-quality questions r…
This study finds that when users do not specify a jurisdiction, the language used in the prompt strongly biases the LLM's response toward a specific national legal framework (U.S. for English, China f…
Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more
This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang +10 more
SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, sign…
Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng +2 more
The paper proposes DareU, a novel LLM unlearning framework that optimizes unlearning by zeroing out data attribution scores instead of maximizing prediction loss, achieving effective unlearning while…
This paper empirically evaluates LLM-generated reviews for academic papers, finding that while LLM reviews show some alignment with human ones, authors can effectively 'game' the system using iterativ…