Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning | ArxivCSExplorer