Xinyu Chen
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
MixFP4 introduces a mixed micro-format extension to NVFP4, allowing blocks to dynamically select between two stored FP4 formats (E2M1 and E1M2) to improve quantization accuracy without altering the standard hardware execution path.
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.
Papers
TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.