Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks | ArxivCSExplorer