Le Xu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes CAST, an answer-free self-distillation method that enhances Group Relative Policy Optimization (GRPO) for verifiable rewards, allowing token-level advantage signals even when all sampled trajectories are uniformly correct or incorrect.
Lodestar is a novel online learning-based request routing system that significantly improves LLM inference efficiency by dynamically assigning incoming requests to the optimal GPU instance to minimize latency.
Papers
Lodestar: An Online-Learning LLM Inference Router
Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan +2 more
Lodestar is a novel online learning-based request routing system that significantly improves LLM inference efficiency by dynamically assigning incoming requests to the optimal GPU instance to minimize…