Jialian Wu
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training and rollout.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.AIRecentMay 29, 2026
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
Daize Dong, Junlin Chen, Haolong Jia, Jiawei Wu +8 more
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training a…
View →