Kai Yu

8 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×8Audio and Speech Processing×3NLP×2Sound×2Vision×1Software Eng.×1

Frequent co-authors

Hanqi Li2×

Yu Xi2×

Xie Chen2×

Junxia Cui1×

Haotian Ye1×

Runchu Tian1×

Research Timeline

2026

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

HoliTok introduces a novel continuous holistic tokenization model that provides a unified, high-fidelity latent representation for simultaneously supporting both speech generation and speech understanding tasks.

ParaTool: Shifting Tool Representations from Context to Parameters

ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving superior content depth and reliability compared to existing methods.

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

The paper proposes Agentic ASR, a closed-loop framework that treats ASR as a multi-turn refinement task, significantly improving semantic accuracy over traditional token-level metrics.

A Unified and Reproducible Experimentation Framework for Speech Understanding

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

The paper introduces OpenSTBench, a unified, multidimensional evaluation framework designed to comprehensively compare heterogeneous speech translation systems by jointly assessing translation, speech, and temporal qualities.

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate editing-based workflows outperform unified models in overall webpage instruction following.

SimSD: Simple Speculative Decoding in Diffusion Language Models

The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilities.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

SimSD: Simple Speculative Decoding in Diffusion Language Models

Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more

View →

cs.CVcs.AIRecentMay 31, 2026