Yutaka Matsuo

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×2AI×2

Frequent co-authors

Shin Ishii1×

Research Timeline

2026

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subsequent fine-tuning method significantly impacts generalization and data scaling.

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple samples, and proposes RePPO for efficient optimization.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 29, 2026

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Soichiro Nishimori, Paavo Parmas, Sotetsu Koyamada, Tadashi Kozuno +3 more

View →

cs.AIcs.LGRecentMay 27, 2026