Tianlong Nan

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1AI×1

Frequent co-authors

Xiaopeng Li1×

Christian Kroer1×

Tianyi Lin1×

Research Timeline

2026

Efficient Exploration for Iterative Nash Preference Optimization

The paper proposes a novel, explicitly exploratory iterative Nash Learning from Human Feedback (NLHF) algorithm that achieves strong regret bounds for optimizing LLMs based on complex, non-scalar human preferences.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 31, 2026

Efficient Exploration for Iterative Nash Preference Optimization

Tianlong Nan, Xiaopeng Li, Christian Kroer, Tianyi Lin

View →