Min Tang

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×1

Frequent co-authors

Xiaobo Wang1×

Tong Wu1×

Jiaqi Li1×

Qi Liu1×

Zilong Zheng1×

Research Timeline

2026

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

The paper introduces SAVE, a framework that uses on-policy feedback and the value function to self-supervise and improve reward models, significantly enhancing RLHF performance across multiple benchmarks.

Highlighted terms show continued research focus across papers

Papers

cs.CLRecentMay 29, 2026

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Xiaobo Wang, Tong Wu, Min Tang, Jiaqi Li +2 more

View →