Hongyu He
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a quotient-DAG view to accurately estimate unordered slate propensities for off-policy evaluation, solving the nuisance variance and computational gap inherent in standard importance sampling for autoregressive recommenders.
SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly improving performance on complex tasks without external skill generators.
Papers
SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen +8 more
SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly impro…