~ similar to 2606.14091· 10 results
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
Echo is a joint-embedding predictive architecture that uses a single, pretrained ViT encoder to simultaneously perform speaker diarization, speech recognition, and dynamic source separation in a share…
The paper introduces BEA-Dialogue+, an expanded 200-hour corpus for Hungarian conversational ASR, demonstrating that while larger data is challenging, specialized fine-tuning techniques significantly…
Zhefan Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang +2 more
The paper introduces PersTurnBench, a novel benchmark and evaluator for assessing personalized user conversation satisfaction at specific turns, addressing the limitation of generic response quality m…
Sijin Sun, Liangbin Zhao, Jiaxiang Cai, Ming Deng +2 more
CobSeg introduces a multi-branch architecture that enhances dialogue topic segmentation by explicitly modeling both semantic coherence and local lexical boundary transitions, achieving state-of-the-ar…
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…
Haechan Kim, Seungjun Chung, Inkyu Park, Jihoo Lee +1 more
The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) to evaluate SpeechLMs, demonstrating that English-centric evaluation fails to capture performance gaps…
Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more
The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.
The paper introduces a novel, scalable framework to monitor and classify dataset usage within research literature, addressing the current lack of infrastructure for tracking data citations.