Papers similar to 2606.01276

~ similar to 2606.01276· 19 results

cs.CLRecentMay 28, 2026

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana +1 more

The paper demonstrates that using English prompts causes large language models to prioritize globally dominant narratives over local cultural knowledge, even when local evidence is provided.

View →

cs.CLRecentJun 1, 2026

When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models

Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more

This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…

View →

cs.CLcs.AIEmpiricalRecentJun 10, 2026

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

Haotao Xie

This paper proposes a domain-specialized large language model, PoetryQwen, for precise translation and emotional understanding of classical poetry.

View →

cs.CLcs.HCRecentMay 29, 2026

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

Yuri Balashov, Rex VanHorn, Mingxi Xu, Austin Downes

The paper benchmarks local, offline LLMs for confidential translation workflows, demonstrating that while they are viable for privacy-sensitive use, they generally lag behind top commercial NMT system…

View →

cs.CLcs.AIRecentMay 28, 2026

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Yutong Wang, Xuebo Liu, Derek F. Wong, Zhilin Li +5 more

The paper introduces Loong, a novel human-like agent that significantly improves long document translation by adaptively selecting and utilizing optimal historical context using a specialized memory m…

View →

cs.CLRecentMay 30, 2026

Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities

Wajdi Zaghouani

The paper develops a theoretically grounded framework for evaluating multilingual LLMs in Social Sciences and Humanities, moving beyond traditional NLP benchmarks to assess interpretive validity and c…

View →

cs.CLRecentMay 29, 2026

The Latin Substrate: How Language Models Represent and Mediate Script Choice

Daniil Gurgurov, Alan Saji, Katharina Trinley, Josef van Genabith +1 more

This paper investigates how LLMs handle multiple writing systems, finding that while they use shared latent representations, the model exhibits a structural bias that makes generating Latin script eas…

View →

cs.AIRecentMay 27, 2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…

View →

cs.CLRecentMay 29, 2026

Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship

Yating Pan, Jiajun Zhang, Jun Wang, Qi Su

The paper introduces SPIRE, a multi-agent framework designed to extend LLM research capabilities to the humanities by enabling evidence-grounded interpretive reasoning over primary sources.

View →

cs.CLRecentJun 1, 2026

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs

Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao +4 more

The paper introduces CultureForest, a new benchmark for evaluating Cultural Norm Grounded Reasoning in LLMs, demonstrating that models struggle to apply their cultural knowledge effectively in realist…

View →

cs.CLRecentMay 31, 2026

From Outliers to Errors: Auditing Pali-to-English LLM Translations with Multi-Reference Adjudication

Máté Metzger, Nadnapang Phophichit, Hansa Dhammahaso

The paper proposes an advanced auditing framework for classical-to-modern LLM translations, demonstrating that embedding drift signals potential error severity rather than error itself, and identifyin…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts

Liu O. Martin, Lucas Bandarkar, Nanyun Peng

The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…

View →

cs.AIcs.CLcs.LGRecentMay 27, 2026

Cultural Binding Heads in Language Models

Avrile Floro, Luca Benedetto

The paper identifies specific attention heads in LLMs responsible for 'cultural binding'—associating cultural items with appropriate identities—and demonstrates that this capability is pre-trained and…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Multilinguality of Large Language Models From a Structural Perspective

Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

This paper analyzes the multilinguality of LLMs by examining their structural properties, finding that low-resource languages are structurally more distinct from English than high-resource languages,…

View →

cs.CLRecentMay 29, 2026

How Much Do LLMs Know About Chinese Zero Pronouns?

Yifei Li, Guanyi Chen, Tingting He

This paper systematically investigates the difficulty of Chinese Zero Pronouns (ZPs) for various LLMs, concluding that ZPs remain a significant and persistent challenge, with state-of-the-art models p…

View →

cs.CLRecentMay 31, 2026

Agentic Clustering: Controllable Text Taxonomies via Multi-Agent Refinement

Simon Löwe, Emily Silcock

The paper introduces an agentic framework for text clustering that dynamically adapts the taxonomy generation process using specialized LLM agents, achieving state-of-the-art performance on multiple b…

View →

cs.AIcs.CLRecentMay 28, 2026

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more

This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…

View →

cs.CLRecentJun 1, 2026

AI as a Tool for Simulation-Based Experiments in Literary Studies

Matthew Wilkens

The paper outlines the potential for using generative AI to conduct large-scale, simulation-based experiments in literary studies, demonstrating initial results in generating constrained literary text…

View →

cs.CLcs.AIRecentMay 31, 2026

Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization

Sangwon Ryu, Yihong Liu, Mingyang Wang, Yunsu Kim +3 more

The paper introduces a new benchmark for multi-target cross-lingual summarization (MTXLS) and proposes an activation steering method that significantly improves LLM performance by guiding the generati…

View →