Yang Zhou
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on mathematical reasoning tasks.
The paper introduces eXTC, a novel framework that combines structured prompt optimization, knowledge distillation, and reinforcement learning to create a highly performant and fully interpretable text classifier.
COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.
The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a reward signal derived from final answer correctness.
Papers
Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more
The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…