Fang Wu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
ADWIN introduces an adaptive window framework for on-policy distillation (OPD) that efficiently manages the supervision horizon by training on short, teacher-anchored prefixes while using delayed full-rollout probes to maintain alignment, significantly reducing training cost while preserving accuracy.
The paper investigates multimodal jailbreak robustness across various reasoning paradigms and finds that explicit image-tool interaction significantly improves safety by guiding the model's internal representations toward safer directions.
The paper investigates multimodal jailbreak robustness across various reasoning paradigms and finds that explicit image-tool interaction significantly improves safety by shifting the model's internal representations toward a safety-relevant direction.
Papers
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
Kun Liang, Chenming Tang, Clive Bai, Weijie Liu +2 more
ADWIN introduces an adaptive window framework for on-policy distillation (OPD) that efficiently manages the supervision horizon by training on short, teacher-anchored prefixes while using delayed full…