Wei Zhu
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes COSE, a method that uses an LLM's intrinsic confidence as an uncertainty signal to improve self-evolutionary training, achieving state-of-the-art performance on general reasoning and mathematics.
Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic tasks and embodiments.
ReSkill is an RL-in-the-loop framework that reconciles skill creation and policy optimization by automatically creating, testing, and refining modular skills alongside the agent's policy learning, leading to superior generalization.
DFlare introduces a lightweight layer-wise fusion mechanism to overcome the narrow conditioning bottleneck of existing block diffusion methods, enabling the scaling of draft models and achieving superior speculative decoding speedups across multiple LLMs.
Papers
ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
Zelin He, Haotian Lin, Boran Han, Wei Zhu +5 more
ReSkill is an RL-in-the-loop framework that reconciles skill creation and policy optimization by automatically creating, testing, and refining modular skills alongside the agent's policy learning, lea…