Consolidating Rewarded Perturbations for LLM Post-Training | ArxivCSExplorer