~ similar to 2606.00732· 15 results
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper introduces a 'Sleep' paradigm for machine learning models to continually learn and transfer knowledge.
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
CaMBRAIN introduces a novel Mamba-based State Space Model (SSM) for real-time, continuous EEG inference, achieving state-of-the-art results with significantly higher throughput than existing methods.
Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3 more
The paper quantifies the exact parametric memory capacity of LLMs using LoRA and proposes a new optimization strategy, MemFT, to enhance memory fidelity.
Xiaosong Han, Ke Chen, Xindi Dai, Di Liang +6 more
TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.
Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain +2 more
The paper formalizes LLM degradation during long generation as 'cognitive fatigue' and introduces the Fatigue Index (FI), a measurable, model-agnostic diagnostic tool for real-time monitoring.
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
FreqLite introduces an ultra-lightweight, frequency-decomposed linear model that significantly outperforms complex transformers on long-term time-series forecasting while drastically reducing computat…
Xiaoyang Jiang, Yanlai Yang, Kenneth A. Norman, Brenden Lake +1 more
The paper introduces BabyCL, a continual multimodal learning framework that processes egocentric video data in a single chronological pass, demonstrating that meaningful word-referent mappings can be…
Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu +21 more
MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.
The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…