Hakan T. Otal
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
Crypto×1AI×1NLP×1
Frequent co-authors
Research Timeline
2026
Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks
This study provides a comprehensive benchmark of 10 frontier LLMs on 200 offensive cybersecurity tasks, finding that environment tooling and model selection are the primary performance drivers, with Claude 4.5 Opus achieving the highest solve rate.
Highlighted terms show continued research focus across papers