~ similar to 2605.29948· 6 results
Yuhan Song, Linhao Zhang, Aiwei Liu, Chuhan Wu +5 more
UniAudio-Token is a framework that enhances existing semantic speech tokenizers with general audio perception, allowing them to handle diverse audio types while maintaining high-fidelity speech capabi…
Zhisheng Zhang, Xiang Li, Yixuan Zhou, Jing Peng +2 more
LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and effi…
Yuyue Wang, Xihua Wang, Xin Cheng, Yijing Chen +1 more
The paper introduces PlanAudio, a unified LLM-based framework that directly synthesizes natural, composite audio containing speech and sounds from unconstrained free-form text prompts, outperforming e…
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li +20 more
The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.
Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu +21 more
MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.