Papers similar to 2606.00888

~ similar to 2606.00888· 20 results

cs.LGcs.AIRecentMay 31, 2026

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…

View →

cs.CLcs.AIRecentJun 1, 2026

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…

View →

cs.CLcs.AIcs.LGRecentJun 4, 2026

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang +1 more

The paper proposes Cross-Layer Sparse Attention (CLSA) to significantly improve the efficiency and accuracy of long-context LLMs by jointly optimizing KV-cache sharing and the routing index across dec…

View →

cs.CLcs.AIRecentMay 29, 2026

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li

The paper proposes $D^3$, a dynamic graph-constrained scheduling framework that optimizes LLM training order by modeling sample interactions as a dynamic influence graph.

View →

cs.LGcs.AIRecentMay 27, 2026

Efficient Pre-Training of LLMs through Truncated SVD Layers

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon +2 more

The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…

View →

cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →

cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →

cs.AIRecentMay 28, 2026

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation

Soumyadeep Jana, Sagar Nishad, Sanasam Ranbir Singh

Moment-KV introduces a novel momentum-based technique to compress the Key-Value (KV) cache during the decoding phase of LLM generation, significantly improving fidelity in long-generation tasks.

View →

cs.CLRecentJun 1, 2026

CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning

Jun-Tao Tang, Zhen-Hao Xie, Yu-Cheng Shi, Da-Wei Zhou

CRAM proposes a novel framework for Multimodal Continual Instruction Tuning that balances task isolation and parameter efficiency by using centroid-guided routing and adaptive MoE to prevent catastrop…

View →

cs.AIRecentJun 1, 2026

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more

This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…

View →

cs.CLcs.AIcs.CVRecentMay 28, 2026

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3 more

The paper quantifies the exact parametric memory capacity of LLMs using LoRA and proposes a new optimization strategy, MemFT, to enhance memory fidelity.

View →

cs.ARRecentMay 29, 2026

SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference

Aradhana Mohan Parvathy, Soumendu Kumar Ghosh, Shamik Kundu, Arnab Raha +3 more

SPARQLe is a hardware-software co-design framework that exploits the inherent sub-precision sparsity of LLM activations to reduce memory traffic and enable efficient computation on lower-bit datapaths…

View →

cs.CLRecentMay 29, 2026

GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs

Junjie Peng, You Wu, Haoyi Wu, Jialong Han +3 more

GRKV introduces a training-free KV-cache merging method that uses global regression to distribute information from evicted tokens, solving the over-merging problem inherent in span-based retention.

View →

cs.AIRecentMay 28, 2026

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more

The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…

View →

cs.LGcs.AIcs.CLRecentApr 20, 2026

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by…

View →

cs.LGcs.AIRecentMay 29, 2026

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen +3 more

BudgetDraft introduces an acceptance-aware multi-view training method that trains a sparse-KV speculative decoder to maintain high acceptance rates across varying context lengths and sparsity levels,…

View →

cs.LGcs.CLRecentJun 3, 2026

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more

This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.

View →

cs.CLcs.AIRecentMay 27, 2026

PrunePath: Towards Highly Structured Sparse Language Models

Zhexuan Gu, Zixun Fu, Yancheng Yuan

PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…

View →

cs.AIcs.LGRecentMay 27, 2026

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa +1 more

This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…

View →

cs.LGcs.AIRecentMay 27, 2026

Locality-Aware Redundancy Pruning for LLM Depth Compression

Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more

The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…

View →