ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.14004· 5 results

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →
cs.CLRecentMay 31, 2026

Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech

Hongfei Du, Jiacheng Shi, Sidi Lu, Gang Zhou +1 more

The paper uses sparse autoencoders to identify specific latent features within LLM-based TTS models, enabling interpretable and fine-grained control over emotional expression by intervening in small s…

View →
cs.SDcs.AIRecentMay 29, 2026

MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors

Guangyin Bao, Taiping Zeng, Jianfeng Feng, Xiangyang Xue

MindVoice is a neuro-to-speech framework that uses pretrained priors to disentangle and reconstruct intelligible speech from noisy, non-invasive neural signals, significantly outperforming existing me…

View →
eess.AScs.AIcs.SDRecentMay 27, 2026

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation

Zhisheng Zhang, Xiang Li, Yixuan Zhou, Jing Peng +2 more

LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and effi…

View →
eess.AScs.CRcs.LGRecentMay 4, 2026

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur

The paper introduces GRIDS, a framework using Local Intrinsic Dimensionality (LID) to detect anomalies in self-supervised speech model representations, showing that LID elevation correlates with ASR d…

View →