Christian Kroer
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
Efficient Exploration for Iterative Nash Preference Optimization
The paper proposes a novel, explicitly exploratory iterative Nash Learning from Human Feedback (NLHF) algorithm that achieves strong regret bounds for optimizing LLMs based on complex, non-scalar human preferences.
Highlighted terms show continued research focus across papers