Papers similar to 2606.06479

~ similar to 2606.06479· 20 results

cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →

cs.AIcs.LGRecentMay 30, 2026

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

Jayanta Dey, Shikhar Srivastava, Itamar Lerner, Christopher Kanan +1 more

SHARP proposes a novel sleep-based hierarchical replay framework to efficiently learn long-range non-stationary temporal patterns in streaming data, achieving improved context retention and predictive…

View →

cs.LGcs.CLRecentMay 31, 2026

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

Chad A. Capps

CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…

View →

cs.LGcs.AIRecentJun 2, 2026

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni

This paper introduces a 'Sleep' paradigm for machine learning models to continually learn and transfer knowledge.

View →

cs.CLcs.AIcs.CVRecentMay 28, 2026

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3 more

The paper quantifies the exact parametric memory capacity of LLMs using LoRA and proposes a new optimization strategy, MemFT, to enhance memory fidelity.

View →

cs.CLcs.AIRecentMay 28, 2026

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Lukas Aichberger, Sepp Hochreiter

The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.LGcs.CLRecentMay 29, 2026

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing

Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen

The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…

View →

cs.AIcs.CLRecentJun 1, 2026

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez, Saisri Padmaja Jonnalagedda, Yuguang Yao +2 more

The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

Not All Synthetic Data Is Yours to Learn From

Sina Alemohammad, Li Chen, Richard G. Baraniuk, Zhangyang Wang

Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…

View →

cs.LGcs.CLRecentMay 30, 2026

Task Structure Reverses Layerwise State Encoding in Sequence Models

Yuhang Jiang

The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…

View →

cs.CLRecentMay 29, 2026

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Sanchit Ahuja, Terra Blevins

The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…

View →

cs.LGcs.AIcs.CRRecentJun 2, 2026

PURGE: Projected Unlearning via Retain-Guided Erasure

Vedant Jawandhia, Daksh Ahuja, Ghufran Alam Siddiqui, Prashant Trivedi +2 more

PURGE is a novel machine unlearning algorithm that leverages the duality between continual learning and unlearning to achieve high data retention while making the unlearned model indistinguishable fro…

View →

cs.LGcs.AIRecentMay 31, 2026

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…

View →

cs.LGcs.AIstat.MLRecentMay 29, 2026

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed +1 more

This paper theoretically justifies the strong performance of linear recurrent neural networks as memory units in partially observable reinforcement learning by constructing specific linear filters tha…

View →

cs.AIRecentMay 28, 2026

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation

Soumyadeep Jana, Sagar Nishad, Sanasam Ranbir Singh

Moment-KV introduces a novel momentum-based technique to compress the Key-Value (KV) cache during the decoding phase of LLM generation, significantly improving fidelity in long-generation tasks.

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models

Hyeonjeong Ha, Jeonghwan Kim, Cheng Qian, Jiayu Liu +6 more

MemGuard introduces a type-aware memory framework to prevent heterogeneous memory contamination in long-term memory-augmented LLMs, significantly improving memory reliability and efficiency.

View →

cs.AIRecentJun 1, 2026

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin +4 more

The eMoT framework enhances multi-step reasoning in LLMs by treating reasoning as an evolving memory, stabilizing performance through symbolic computation and structured refinement.

View →

cs.LGcs.CCRecentJun 1, 2026

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

Qian Li, Xinyu Mao, Shang-Hua Teng

The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…

View →

cs.AIRecentMay 28, 2026

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang +6 more

The paper introduces Metacognitive Memory Policy Optimization (MMPO), a novel memory training approach that optimizes LLM memory not based on final task success, but on minimizing epistemic uncertaint…

View →