20 results for “Long-context language modeling”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Yutong Wang, Xuebo Liu, Derek F. Wong, Zhilin Li +5 more
The paper introduces Loong, a novel human-like agent that significantly improves long document translation by adaptively selecting and utilizing optimal historical context using a specialized memory m…
Ghadir Alselwi, Basem Suleiman, Hao Xue, Shoaib Jameel +3 more
This paper introduces KGERMAR, a framework that constructs dynamic, context-specific knowledge graphs during inference for long-context language modeling, achieving lower perplexity and better memory…
LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on…
The paper proposes Periodic RoPE (P-RoPE) combined with a dual-layer attention mechanism to overcome the positional encoding limitations of LLMs, enabling theoretically infinite context understanding.
Heyang Liu, Ziyang Cheng, Jiayi Huang, Wenyang Xiao +4 more
The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.
LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…
Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more
The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.
The paper introduces MIDI, a novel multilingual dataset that embeds idioms in realistic sentence and conversational contexts across diverse resource levels, revealing that idiom comprehension is signi…
The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).
The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…
The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…
ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.
The paper proposes VISTA, a multi-level event semantics mining framework, to accurately predict complex events in long videos, addressing the limitations of current LLMs in this domain.
Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more
This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.
Soft-NBCE introduces soft entropy-weighted chunk fusion to overcome the semantic fragmentation caused by hard chunk selection in long-context LLMs, significantly improving performance on multi-hop ben…
This paper demonstrates that fine-tuning small language models (SLMs) on a synthetic, solution-rich Windows event log dataset allows them to outperform larger LLMs in identifying issues and providing…
The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…
CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…
The paper introduces PortBERT, a family of RoBERTa-based language models for Portuguese, which achieves competitive performance while explicitly balancing efficiency and accuracy.
This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.