Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning | ArxivCSExplorer