When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks? | ArxivCSExplorer