Jiazhen Hu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes Skill-Conditioned Gated Self-Distillation (SGSD), a novel framework that uses retrieved, potentially noisy skills to guide LLM reasoning, achieving state-of-the-art performance on mathematical reasoning benchmarks.
VideoMLA introduces a novel Multi-Head Latent Attention (MLA) mechanism that replaces per-head KV caches with a shared low-rank content latent, significantly reducing memory and improving throughput for autoregressive video diffusion.
Papers
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
VideoMLA introduces a novel Multi-Head Latent Attention (MLA) mechanism that replaces per-head KV caches with a shared low-rank content latent, significantly reducing memory and improving throughput f…