Kai Yu
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
HoliTok introduces a novel continuous holistic tokenization model that provides a unified, high-fidelity latent representation for simultaneously supporting both speech generation and speech understanding tasks.
ParaTool introduces a novel framework that shifts tool representations from bulky context documentation to dedicated, loadable parameters, enabling efficient and robust tool calling in LLMs.
DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving superior content depth and reliability compared to existing methods.
The paper proposes Agentic ASR, a closed-loop framework that treats ASR as a multi-turn refinement task, significantly improving semantic accuracy over traditional token-level metrics.
The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.
The paper introduces OpenSTBench, a unified, multidimensional evaluation framework designed to comprehensively compare heterogeneous speech translation systems by jointly assessing translation, speech, and temporal qualities.
The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate editing-based workflows outperform unified models in overall webpage instruction following.
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilities.
Papers
SimSD: Simple Speculative Decoding in Diffusion Language Models
Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…