Papers similar to 2605.30792

~ similar to 2605.30792· 10 results

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →

cs.CLcs.AIRecentMay 27, 2026

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

Haechan Kim, Seungjun Chung, Inkyu Park, Jihoo Lee +1 more

The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) to evaluate SpeechLMs, demonstrating that English-centric evaluation fails to capture performance gaps…

View →

cs.CLRecentMay 29, 2026

Model-Based Quality Assessment for Massively Multilingual Parallel Data

Abdelaziz M. A. Ibrahim, Zihao Li, Jörg Tiedemann, Shaoxiong Ji

The paper proposes decomposing the assessment of massive multilingual parallel data into separate parallelism and quality estimation components, concluding that no single universal metric is reliable…

View →

cs.AIRecentMay 27, 2026

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Yexing Du, Kaiyuan Liu, Youcheng Pan, Bo Yang +3 more

The paper proposes ESRT, an edge-cloud framework that achieves state-of-the-art, bandwidth-efficient, and privacy-preserving many-to-many speech translation across 45 languages by splitting the model…

View →

cs.CLcs.AIcs.SDRecentMay 29, 2026

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

Máté Gedeon, Piroska Zsófia Barta, Péter Mihajlik, Katalin Mády

The paper introduces BEA-Dialogue+, an expanded 200-hour corpus for Hungarian conversational ASR, demonstrating that while larger data is challenging, specialized fine-tuning techniques significantly…

View →

eess.AScs.AIcs.SDRecentMay 29, 2026

A Unified and Reproducible Experimentation Framework for Speech Understanding

Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li +20 more

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

View →

cs.CLcs.AIcs.SDRecentMay 29, 2026

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

Sara Papi, Luisa Bentivogli

The paper proposes DOA, a training-free attention policy that leverages self-attention in decoder-only SpeechLLMs to achieve high-quality, low-latency simultaneous long-form translation without requir…

View →

cs.CLcs.HCRecentMay 29, 2026

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

Yuri Balashov, Rex VanHorn, Mingxi Xu, Austin Downes

The paper benchmarks local, offline LLMs for confidential translation workflows, demonstrating that while they are viable for privacy-sensitive use, they generally lag behind top commercial NMT system…

View →

cs.LGcs.AIeess.ASRecentMay 31, 2026

MURMUR: An Efficient Inference System for Long-Form ASR

Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci

Murmur is an efficient inference system for long-form ASR that resolves the accuracy-latency trade-off by optimizing both inter-chunk processing and intra-chunk attention mechanisms.

View →

cs.SDcs.AIeess.ASRecentMay 29, 2026

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS

Deokjin Seo, Gangin Park, Kihyun Nam

Chatterbox-Flash introduces a prior-calibrated block diffusion model for zero-shot TTS that achieves high-fidelity, streaming synthesis with significantly lower computational overhead than existing me…

View →