Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Kai Yu

Kai Yu

8 indexed papers

Recent (6 mo)
8
With code
0
Influential cites
0
Benchmarked
0

Publications per year

8
26

Top categories

AI×8Audio and Speech Processing×3NLP×2Sound×2Vision×1Software Eng.×1

Frequent co-authors

Hanqi Li2×
Yu Xi2×
Xie Chen2×
Junxia Cui1×
Haotian Ye1×
Runchu Tian1×

Research Timeline

2026
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

HoliTok introduces a novel continuous holistic tokenization model that provides a unified, high-fidelity latent representation for simultaneously supporting both speech generation and speech understanding tasks.

ParaTool: Shifting Tool Representations from Context to Parameters

ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving superior content depth and reliability compared to existing methods.

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

The paper proposes Agentic ASR, a closed-loop framework that treats ASR as a multi-turn refinement task, significantly improving semantic accuracy over traditional token-level metrics.

A Unified and Reproducible Experimentation Framework for Speech Understanding

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

The paper introduces OpenSTBench, a unified, multidimensional evaluation framework designed to comprehensively compare heterogeneous speech translation systems by jointly assessing translation, speech, and temporal qualities.

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate editing-based workflows outperform unified models in overall webpage instruction following.

SimSD: Simple Speculative Decoding in Diffusion Language Models

The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilities.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

SimSD: Simple Speculative Decoding in Diffusion Language Models

Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more

The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…

View →
cs.CVcs.AIRecentMay 31, 2026

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

Zhihong Liu, Siqi Kou, Zheng Li, Ye Ma +4 more

The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate edi…

View →
eess.AScs.AIcs.SDRecentMay 29, 2026

A Unified and Reproducible Experimentation Framework for Speech Understanding

Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li +20 more

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

View →
eess.AScs.AIRecentMay 29, 2026

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Yanjie An, Yuxiang Zhao, Yichi Zhang, Qixi Zheng +4 more

The paper introduces OpenSTBench, a unified, multidimensional evaluation framework designed to comprehensively compare heterogeneous speech translation systems by jointly assessing translation, speech…

View →
cs.SDcs.AIeess.ASRecentMay 28, 2026

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Bohan Li, Shi Lian, Hankun Wang, Yiwei Guo +5 more

HoliTok introduces a novel continuous holistic tokenization model that provides a unified, high-fidelity latent representation for simultaneously supporting both speech generation and speech understan…

View →
cs.AIcs.SERecentMay 28, 2026

ParaTool: Shifting Tool Representations from Context to Parameters

Zekai Yu, Qi Meng, Qizhi Chu, Yu Hao +2 more

ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.

View →
cs.AIRecentMay 28, 2026

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

Ziyue Yang, Da Ma, Hanqi Li, Zijian Wang +7 more

DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving su…

View →
cs.AIcs.CLRecentMay 28, 2026

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Zixuan Jiang, Yanqiao Zhu, Peng Wang, Qinyuan Chen +7 more

The paper proposes Agentic ASR, a closed-loop framework that treats ASR as a multi-turn refinement task, significantly improving semantic accuracy over traditional token-level metrics.

View →