HPO: Hysteretic Policy Optimization for Stable and Efficient Training under Sparse-Reward Regime | ArxivCSExplorer