~ similar to 2606.13647· 19 results
This paper introduces robustness indicators to systematically analyze how multilingual text embedding model rankings change based on dataset composition and aggregation methods, revealing that only a…
The paper proposes decomposing the assessment of massive multilingual parallel data into separate parallelism and quality estimation components, concluding that no single universal metric is reliable…
The paper introduces XLGoBench, a synthetic benchmark of algorithmic tasks designed to detect persistent cross-lingual skill gaps in large language models.
The paper introduces TSM-Bench, a new benchmark that demonstrates existing LLM-generated text detectors fail to accurately identify task-specific machine-generated content found in real-world Wikipedi…
The paper proposes MIMO, a two-stage framework that improves Multilingual Information Retrieval (MLIR) by stabilizing cross-lingual alignment and enhancing retrieval discrimination using a combination…
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
This study systematically analyzes strategies for creating reliable multilingual LLMs-as-a-judge, finding that fine-tuning smaller models with in-domain data is effective, while zero-shot evaluation w…
Yu-Che Tsai, Kuan-Yu Chen, Yuan-Hao Chen, Yu-Han Chang +3 more
PromptEmbedder introduces a dual-LLM framework that efficiently and transferably adapts text embeddings by decoupling task-specific knowledge from the backbone model, significantly reducing computatio…
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…
The paper introduces UA-Legal-Bench, a comprehensive Ukrainian legal reasoning benchmark built from a massive judicial corpus, demonstrating that LLM performance is highly task-dependent and that simp…
The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…
The paper introduces and evaluates five parameter alignment strategies that significantly mitigate catastrophic forgetting when continually pretraining multilingual expert language models across multi…
This paper details the systematic construction and training of a high-performing Romanian Vision-Language Model (VLM), demonstrating that language-specific adaptation significantly boosts performance…
Sangwon Ryu, Yihong Liu, Mingyang Wang, Yunsu Kim +3 more
The paper introduces a new benchmark for multi-target cross-lingual summarization (MTXLS) and proposes an activation steering method that significantly improves LLM performance by guiding the generati…
The paper proposes using Maximum Independent Set (MIS) algorithms on similarity graphs to select a maximally diverse and non-redundant subset of prompts for LLM benchmarking, achieving consistent rank…
This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.
The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…
The authors demonstrate that fine-tuning a two-stage retrieval system using synthetic data generated by large language models can significantly improve the performance of medical semantic search for c…
The paper benchmarks local, offline LLMs for confidential translation workflows, demonstrating that while they are viable for privacy-sensitive use, they generally lag behind top commercial NMT system…