~ similar to 2605.30260· 20 results
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
CRAM proposes a novel framework for Multimodal Continual Instruction Tuning that balances task isolation and parameter efficiency by using centroid-guided routing and adaptive MoE to prevent catastrop…
Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +4 more
ElasticMem introduces a novel framework that treats memory as an elastic latent resource, allowing LLM agents to adaptively manage and inject variable-budget memories for improved performance in long-…
The paper proposes SAGE, a novelty-aware gate that efficiently controls memory updates in agentic LLMs by classifying new facts as clearly novel, clearly redundant, or uncertain, thereby significantly…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more
The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
Ziyang Zheng, Zeju Li, Xiangyu Wen, Jianyuan Zhong +4 more
The paper reframes context distillation as a latent memory management problem, proposing a modular framework using LoRA adapters and a Self-Gating mechanism for efficient, selective memory retrieval a…
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
This paper introduces Anchored Weight Decay (AWD), a regularization technique that effectively prevents prior-task forgetting during LLM fine-tuning with Evolution Strategies (ES), positioning ES as a…
The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
Xinle Deng, Ruobin Zhong, Hujin Peng, Xiaoben Lu +14 more
The paper introduces MemTrace, a framework that treats LLM memory pipelines as traceable graphs to systematically diagnose and automatically correct memory-related errors, boosting performance by up t…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
Xiaosong Han, Ke Chen, Xindi Dai, Di Liang +6 more
TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.
Weifang Zhang, Yuzhou Nie, Bowen Pang, Guangrui Ma +1 more
This paper proposes a hybrid scheduler that dynamically switches between exclusive batching and mixed batching for LLM inference, achieving superior throughput, especially on bandwidth-constrained GPU…