Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Yang Zhou

Yang Zhou

4 indexed papers

Recent (6 mo)
4
With code
0
Influential cites
0
Benchmarked
0

Publications per year

4
26

Top categories

AI×4ML×2Vision×1NLP×1

Frequent co-authors

Jiakang Li1×
Guanyu Zhu1×
Can Jin1×
Chenxi Huang1×
Dexu Yu1×
Ronghao Chen1×

Research Timeline

2026
OISD: On-Policy Internal Self-Distillation of Language Models

The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on mathematical reasoning tasks.

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

The paper introduces eXTC, a novel framework that combines structured prompt optimization, knowledge distillation, and reinforcement learning to create a highly performant and fully interpretable text classifier.

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a reward signal derived from final answer correctness.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 30, 2026

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…

View →
cs.AIRecentMay 29, 2026

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Wenkai Shen, Pengyang Zhou, Jiahe Xu, Jiaming Qian +4 more

COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.

View →
cs.LGcs.AIcs.CVRecentMay 27, 2026

OISD: On-Policy Internal Self-Distillation of Language Models

Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more

The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…

View →
cs.CLcs.AIcs.LGRecentMay 27, 2026

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu

The paper introduces eXTC, a novel framework that combines structured prompt optimization, knowledge distillation, and reinforcement learning to create a highly performant and fully interpretable text…

View →