An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models | ArxivCSExplorer