Abolfazl Razi

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×1ML×1

Frequent co-authors

Xiwen Chen1×

Wenhui Zhu1×

Jingjing Wang1×

Peijie Qiu1×

Zhipeng Wang1×

Huayu Li1×

Research Timeline

2026

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins to semantically similar responses.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.LGRecentJun 1, 2026

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu +12 more

View →