"Long-context language modeling"

20 results for “Long-context language modeling”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLcs.AIRecentMay 28, 2026

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Yutong Wang, Xuebo Liu, Derek F. Wong, Zhilin Li +5 more

The paper introduces Loong, a novel human-like agent that significantly improves long document translation by adaptively selecting and utilizing optimal historical context using a specialized memory m…

View →

cs.IRcs.AIcs.CLEmpiricalRecentJun 12, 2026

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

Ghadir Alselwi, Basem Suleiman, Hao Xue, Shoaib Jameel +3 more

This paper introduces KGERMAR, a framework that constructs dynamic, context-specific knowledge graphs during inference for long-context language modeling, achieving lower perplexity and better memory…

View →

cs.CLRecentMay 31, 2026

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Mengmeng Ji, Ravi Shanker Raju, Jonathan Lingjie Li, Chen Wu

LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on…

View →

cs.CLcs.AIRecentMay 27, 2026

Periodic RoPE for Infinite Context LLMs

Simin Huo

The paper proposes Periodic RoPE (P-RoPE) combined with a dual-layer attention mechanism to overcome the positional encoding limitations of LLMs, enabling theoretically infinite context understanding.

View →

cs.CLRecentMay 30, 2026

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Heyang Liu, Ziyang Cheng, Jiayi Huang, Wenyang Xiao +4 more

The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…

View →

cs.CLcs.IRRecentMay 29, 2026

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more

The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.

View →

cs.CLcs.AIRecentJun 1, 2026

Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages

Saeed Almheiri, Bilal Elbouardi, Salsabila Zahirah Pranida, Irina Nikishina +15 more

The paper introduces MIDI, a novel multilingual dataset that embeds idioms in realistic sentence and conversational contexts across diverse resource levels, revealing that idiom comprehension is signi…

View →

cs.CLcs.AIRecentMay 30, 2026

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Hyundong Jin, Yo-Sub Han

The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).

View →

cs.CLRecentMay 29, 2026

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Sanchit Ahuja, Terra Blevins

The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts

Liu O. Martin, Lucas Bandarkar, Nanyun Peng

The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…

View →

cs.AIcs.SERecentMay 28, 2026

ParaTool: Shifting Tool Representations from Context to Parameters

Zekai Yu, Qi Meng, Qizhi Chu, Yu Hao +2 more

ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.

View →

cs.CVcs.CLRecentMay 29, 2026

Towards Effective Long-Video Event Prediction via Multi-Level Event Semantics Mining

Bo Peng, YuanJie Lyu, PengGang Qin, Tong Xu

The paper proposes VISTA, a multi-level event semantics mining framework, to accurately predict complex events in long videos, addressing the limitations of current LLMs in this domain.

View →

cs.CLcs.AIcs.LGEmpiricalRecentJun 11, 2026

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková +1 more

This paper introduces SkMTEB, a comprehensive text embedding benchmark for Slovak, and develops efficient, locally-deployable Slovak embeddings.

View →

cs.LGcs.AIRecentMay 31, 2026

Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context

Shihao Ji, Mingyu Li, Zihui Song

Soft-NBCE introduces soft entropy-weighted chunk fusion to overcome the semantic fragmentation caused by hard chunk selection in long-context LLMs, significantly improving performance on multi-hop ben…

View →

cs.CRcs.AIRecentMay 7, 2026

Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis

Siraaj Akhtar, Saad Khan, Simon Parkinson

This paper demonstrates that fine-tuning small language models (SLMs) on a synthetic, solution-rich Windows event log dataset allows them to outperform larger LLMs in identifying issues and providing…

View →

cs.AIRecentMay 28, 2026

Accelerating Constrained Decoding with Token Space Compression

Michael Sullivan, Alexander Koller

The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…

View →

cs.LGcs.CLRecentMay 31, 2026

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

Chad A. Capps

CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…

View →

cs.CLRecentJun 1, 2026

PortBERT: Navigating the Depths of Portuguese Language Models

Raphael Scheible-Schmitt, Henry He, Armando B. Mendes

The paper introduces PortBERT, a family of RoBERTa-based language models for Portuguese, which achieves competitive performance while explicitly balancing efficiency and accuracy.

View →

cs.CLcs.IREmpiricalRecentJun 10, 2026

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Simon Lupart, Kidist Amde Mekonnen, Zahra Abbasiantaeb, Mohammad Aliannejadi

This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.

View →