20 results for “Recurrent Neural Networks”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper investigates limitations of learning tanh neural networks under finite-precision computations and Lp accuracy guarantees.
The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…
Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed +1 more
This paper theoretically justifies the strong performance of linear recurrent neural networks as memory units in partially observable reinforcement learning by constructing specific linear filters tha…
The paper proposes PG-RSSNN, a physics-guided recurrent state-space neural network that improves multi-step prediction stability and accuracy compared to both pure black-box and pure physics models, e…
SHARP proposes a novel sleep-based hierarchical replay framework to efficiently learn long-range non-stationary temporal patterns in streaming data, achieving improved context retention and predictive…
CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…
The paper proposes CYKNN, a novel recurrent neural network architecture that directly encodes the CYK parsing algorithm, demonstrating superior performance over large language models on syntactic pars…
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
The paper introduces QuITE, a plug-and-play embedding module that uses learnable query tokens to effectively embed irregular multivariate time series data into latent representations compatible with e…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
Qian Chang, Ciprian Doru Giurcaneanu, Runsong Jia, Xia Li +5 more
The paper proposes Dual-Scale Retentive Dynamics (DSRD), a unified framework that improves representation learning on dynamic graphs by jointly modeling evolving temporal and structural dependencies.
The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu +2 more
The paper establishes the first theoretical framework for analyzing the learnability of Test-Time Adaptation (TTA) under non-stationary data streams by introducing Recovery Complexity, which quantifie…
This paper introduces a 'Sleep' paradigm for machine learning models to continually learn and transfer knowledge.
The paper introduces a novel, non-deep neural network architecture that achieves the performance of LLMs by finding the global optimum of the loss function in a single, closed-form iteration, eliminat…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The paper demonstrates that an attention-augmented LSTM model can achieve near-perfect character-level decipherment of homophonic ciphertexts from historical English and Swedish, even under challengin…
The paper argues that shallow safety alignment in LLMs is due to autoregressive consistency, a mechanism that allows small harmful inputs to redirect the model's generation to unsafe outputs, necessit…
MARS proposes an encoder-agnostic aggregation operator that explicitly models multi-scale temporal structure in sequential recommendation, achieving state-of-the-art performance across both sparse and…