~ similar to 2605.30668· 14 results
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
The paper introduces Semantic Flow Regularization (SFR), an auxiliary objective that significantly improves the diversity and quality of LLM responses when fine-tuned for specific styles or personas,…
Daniel Arnould, Rashad Aziz, Zixuan Kang, Tanav Changal +4 more
CA-BED is a novel framework that improves LLM performance in interactive question-answering by integrating Bayesian Experimental Design to strategically select questions that maximize information gain…
Jingjie Lin, Bingbing Wang, Zihan Wang, Zhengda Jin +3 more
The paper introduces RefMem-Bench, a new benchmark for measuring reflective memory in long-horizon dialogue, and proposes REMIND, a framework that significantly improves models' ability to synthesize…
The paper introduces BEA-Dialogue+, an expanded 200-hour corpus for Hungarian conversational ASR, demonstrating that while larger data is challenging, specialized fine-tuning techniques significantly…
Heyang Liu, Ziyang Cheng, Jiayi Huang, Wenyang Xiao +4 more
The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.
Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more
The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.
Bingyu Li, Da Zhang, Tao Huo, Zhiyuan Zhao +2 more
The paper introduces Multi-temporal Referring Segmentation (MTRS), a new task requiring models to segment language-described temporal changes, and proposes MTRefSeg-R1, a specialized framework that ac…
Soft-NBCE introduces soft entropy-weighted chunk fusion to overcome the semantic fragmentation caused by hard chunk selection in long-context LLMs, significantly improving performance on multi-hop ben…
Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh +1 more
The paper introduces FRANZ, a communicative audit framework, to evaluate how LLMs frame responses to subjective questions, finding that LLMs exhibit statistically significant and coupled differences i…
The paper introduces OCC-RAG, a family of compact, task-specialized Small Language Models (SLMs) designed to achieve highly faithful, multi-hop question answering grounded strictly in provided context…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…