Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Paavo Parmas

Paavo Parmas

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1AI×1

Frequent co-authors

Soichiro Nishimori1×
Sotetsu Koyamada1×
Tadashi Kozuno1×
Toshinori Kitamura1×
Shin Ishii1×
Yutaka Matsuo1×

Research Timeline

2026
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple samples, and proposes RePPO for efficient optimization.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 29, 2026

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno +3 more

The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple s…

View →