Abolfazl Razi
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1ML×1
Frequent co-authors
Research Timeline
2026
S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins to semantically similar responses.
Highlighted terms show continued research focus across papers
Papers
cs.AIcs.LGRecentJun 1, 2026
S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu +12 more
S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins…
View →