Mingyi Wang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes Detector Evasion Policy Optimization (DEPO), a constrained reinforcement learning method that effectively evades AI text detectors while strictly maintaining the original text's semantics.
OmniOPD introduces a logit-free, chunk-level distillation framework that improves on standard On-Policy Distillation by using semantic similarity and peak-entropy scheduling, achieving state-of-the-art performance even with black-box teachers.
Papers
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
Yuhang Zhou, Lizhu Zhang, Yifan Wu, Mingyi Wang +4 more
OmniOPD introduces a logit-free, chunk-level distillation framework that improves on standard On-Policy Distillation by using semantic similarity and peak-entropy scheduling, achieving state-of-the-ar…