Tadashi Kozuno
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple samples, and proposes RePPO for efficient optimization.
Highlighted terms show continued research focus across papers