Jiang Liu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces DiffErase, a black-box attack that effectively removes inaudible audio watermarks while preserving perceptual quality by utilizing diffusion models.
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training and rollout.
The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.
Papers
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
Daize Dong, Junlin Chen, Haolong Jia, Jiawei Wu +8 more
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training a…