ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.30324· 20 results

cs.LGcs.CLcs.CRRecentApr 8, 2026

On the Price of Privacy for Language Identification and Generation

Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao

The paper quantifies the cost of privacy in language identification and generation using differentially private (DP) methods, finding that the cost is surprisingly mild, particularly absent under appr…

View →
cs.LGcs.AIRecentMay 31, 2026

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…

View →
cs.CLRecentMay 31, 2026

On the Generalization Gap in Self-Evolving Language Model Reasoning

Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more

The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…

View →
cs.LGcs.CLRecentMay 29, 2026

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing

Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen

The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…

View →
cs.CLcs.AIRecentMay 28, 2026

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Lukas Aichberger, Sepp Hochreiter

The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…

View →
cs.LGcs.CCRecentJun 1, 2026

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

Qian Li, Xinyu Mao, Shang-Hua Teng

The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…

View →
cs.AIRecentMay 28, 2026

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more

The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…

View →
cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →
cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →
cs.AIRecentMay 28, 2026

Make LLM Learn to Synthesize from Streaming Experiences through Feedback

Zhenlin Hu, Yan Wang, Zhen Bi, Zihao Xue +6 more

The paper introduces StreamSynth, a sequential setting for synthetic data generation, and proposes SynLearner, a framework that enables LLMs to improve synthesis performance by accumulating and transf…

View →
cs.ITcs.AIcs.LGRecentMay 30, 2026

Information-Theoretic Lower Bounds for Bit-Constrained Stochastic Optimization via a Reduction to Compressed Gaussian Mean Estimation

Munsik Kim

The paper establishes information-theoretic lower bounds for stochastic optimization using low-bit gradients by reducing the problem to compressed Gaussian mean estimation, yielding sharp bounds on co…

View →
cs.PLcs.CCcs.DBRecentJun 1, 2026

From Time to Space: The Impact of Linearity in Higher-Order Datalog

Angelos Charalambidis, Babis Kostopoulos, Panos Rondogiannis

The paper analyzes a fragment of Higher-Order Datalog, showing that restricting recursion to a linear form shifts its expressive power from time complexity to space complexity, specifically capturing…

View →
cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →
cs.LGcs.CLRecentMay 28, 2026

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

Munawar Hasan

The paper introduces and evaluates bounded behavioral indistinguishability, showing that while LoRA distillation improves semantic similarity, it does not guarantee that the student model is behaviora…

View →
cs.CLcs.AIRecentMay 29, 2026

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li

The paper proposes $D^3$, a dynamic graph-constrained scheduling framework that optimizes LLM training order by modeling sample interactions as a dynamic influence graph.

View →
cs.CLRecentMay 29, 2026

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Sanchit Ahuja, Terra Blevins

The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…

View →
cs.CLcs.AIcs.CVRecentMay 28, 2026

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3 more

The paper quantifies the exact parametric memory capacity of LLMs using LoRA and proposes a new optimization strategy, MemFT, to enhance memory fidelity.

View →
cs.FLcs.CLcs.LGRecentJun 1, 2026

An Algebraic View of the Expressivity of Recurrent Language Models

Franz Nowak, Ryan Cotterell, Reda Boumasmoud

The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…

View →
cs.AIRecentMay 27, 2026

Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

Mingze Wu, Abhinav Anand, Shweta Verma, Mira Mezini

This paper proposes using offline reinforcement learning (RL) as an efficient alternative to online RL for post-training code-generating LLMs, demonstrating its effectiveness, especially for smaller m…

View →
cs.CLcs.AIcs.LGRecentMay 29, 2026

Not All Synthetic Data Is Yours to Learn From

Sina Alemohammad, Li Chen, Richard G. Baraniuk, Zhangyang Wang

Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…

View →