~ similar to 2606.02147· 18 results
Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more
This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…
The paper develops a theoretically grounded framework for evaluating multilingual LLMs in Social Sciences and Humanities, moving beyond traditional NLP benchmarks to assess interpretive validity and c…
The paper introduces CARTE, a new benchmark designed to test how well large language models understand fine-grained, regionally differentiated knowledge across the 13 metropolitan regions of France, r…
The paper introduces ACROS, a method that induces an explicit sense representation pathway into a frozen pretrained decoder LM, enabling sense-based tasks like disambiguation and cross-lingual alignme…
Linfeng Liu, Tiffany Zhan, Louie Hong Yao, Saptarshi Ghosh +1 more
The paper demonstrates that the internal signals governing figurative language generation are reusable across multiple languages, showing that a steering direction learned in one language can effectiv…
The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…
The paper introduces a diagnostic framework to decompose multilingual LLM performance variance, showing that language identity and model-benchmark interactions are key drivers of performance gaps.
The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…
Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana +1 more
The paper demonstrates that using English prompts causes large language models to prioritize globally dominant narratives over local cultural knowledge, even when local evidence is provided.
The paper proposes MIMO, a two-stage framework that improves Multilingual Information Retrieval (MLIR) by stabilizing cross-lingual alignment and enhancing retrieval discrimination using a combination…
This paper proposes a domain-specialized large language model, PoetryQwen, for precise translation and emotional understanding of classical poetry.
This study systematically analyzes strategies for creating reliable multilingual LLMs-as-a-judge, finding that fine-tuning smaller models with in-domain data is effective, while zero-shot evaluation w…
The paper introduces PortBERT, a family of RoBERTa-based language models for Portuguese, which achieves competitive performance while explicitly balancing efficiency and accuracy.
The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…
Jingjie Lin, Bingbing Wang, Zihan Wang, Zhengda Jin +3 more
The paper introduces RefMem-Bench, a new benchmark for measuring reflective memory in long-horizon dialogue, and proposes REMIND, a framework that significantly improves models' ability to synthesize…
Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao +4 more
The paper introduces CultureForest, a new benchmark for evaluating Cultural Norm Grounded Reasoning in LLMs, demonstrating that models struggle to apply their cultural knowledge effectively in realist…
Haechan Kim, Seungjun Chung, Inkyu Park, Jihoo Lee +1 more
The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) to evaluate SpeechLMs, demonstrating that English-centric evaluation fails to capture performance gaps…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.