ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Basic knowledge of machine learning and large language models”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLcs.AIcs.DSRecentMay 29, 2026

Neuro-symbolic Syntactic Parsing: Shaping a Neural Network with the CYK Algorithm

Fabio Massimo Zanzotto, Federico Ranaldi, Giorgio Satta

The paper proposes CYKNN, a novel recurrent neural network architecture that directly encodes the CYK parsing algorithm, demonstrating superior performance over large language models on syntactic pars…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

Sourav Das

ProbScale is a novel framework that combines neural scaling laws and language model probing to identify highly efficient, task-specific subnetworks within pre-trained Small Language Models, achieving…

View →
cs.CLcs.AIcs.LGRecentMay 27, 2026

Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts

Liu O. Martin, Lucas Bandarkar, Nanyun Peng

The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…

View →
cs.AIcs.CLRecentMay 28, 2026

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more

This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…

View →
cs.LGcs.AIRecentMay 27, 2026

Learning the Error Patterns of Language Models

Jinwoo Kim, Taylor Berg-KirkPatrick, Loris D'Antoni

The paper introduces prefix filters and an algorithm (Palla) to systematically learn and apply specific error patterns in Large Language Models, significantly improving constrained generation tasks li…

View →
cs.DScs.AIcs.CLRecentMay 28, 2026

On Language Generation in the Limit with Bounded Memory

Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

The paper analyzes language generation and identification in the limit under bounded memory, showing that memory constraints significantly alter learnability, particularly affecting achievable density…

View →
cs.CRcs.AIRecentMay 7, 2026

Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis

Siraaj Akhtar, Saad Khan, Simon Parkinson

This paper demonstrates that fine-tuning small language models (SLMs) on a synthetic, solution-rich Windows event log dataset allows them to outperform larger LLMs in identifying issues and providing…

View →
cs.CRcs.AIRecentJun 1, 2026

Large Byte Model: Teaching Language Models About Compiled Code

Florian Störtz, Catalin-Andrei Stan, Alexandru Dinu, Sandra Servia-Rodríguez +3 more

The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.

View →
eess.AScs.AIcs.SDRecentMay 29, 2026

A Unified and Reproducible Experimentation Framework for Speech Understanding

Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li +20 more

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

View →
cs.CLcs.AIcs.LGEmpiricalRecentJun 11, 2026

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more

This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.

View →
cs.AIRecentMay 31, 2026

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more

The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…

View →
cs.AIcs.CLRecentMay 27, 2026

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

Dominika Agnieszka Długosz, Arlindo Oliveira, Natalia Díaz-Rodríguez

The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…

View →
cs.CLcs.AIRecentMay 30, 2026

Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence

Wanying Ren, Xin Song, Futing Wang, Guoxiu He +1 more

The paper theoretically analyzes the limitations of parameter-based knowledge editing and empirically demonstrates that these methods consistently damage core LLM capabilities compared to retrieval-ba…

View →
cs.CLRecentMay 29, 2026

How Much Do LLMs Know About Chinese Zero Pronouns?

Yifei Li, Guanyi Chen, Tingting He

This paper systematically investigates the difficulty of Chinese Zero Pronouns (ZPs) for various LLMs, concluding that ZPs remain a significant and persistent challenge, with state-of-the-art models p…

View →
cs.CLcs.AIcs.LGRecentMay 28, 2026

Data filtering methods for training language models

Egor Shevchenko, Elena Bruches

This paper comparatively analyzes two automatic label error detection methods, Confident Learning and Dataset Cartography, demonstrating that targeted data filtering significantly improves model perfo…

View →
cs.CLcs.AIRecentMay 29, 2026

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li

The paper proposes $D^3$, a dynamic graph-constrained scheduling framework that optimizes LLM training order by modeling sample interactions as a dynamic influence graph.

View →
cs.SEcs.CRRecentApr 1, 2026

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar, Eray Tüzün +1 more

SERSEM introduces a selective entropy-weighted scoring framework to significantly improve Membership Inference Attacks (MIAs) against code LLMs by focusing on human-centric coding anomalies rather tha…

View →
cs.CLRecentMay 29, 2026

Mellum2 Technical Report

Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more

Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…

View →
cs.CLRecentMay 31, 2026

On the Generalization Gap in Self-Evolving Language Model Reasoning

Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more

The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…

View →
cs.DCcs.AIRecentJun 1, 2026

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

Yafan Huang, Sheng Di, Guanpeng Li

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…

View →