Chunqiu Steven Xia

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×1ML×1

Frequent co-authors

Hwiwon Lee1×

Jiawei Liu1×

Dongjun Kim1×

Ziqi Zhang1×

Lingming Zhang1×

Research Timeline

2026

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.LGRecentMay 26, 2026

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more

The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.

View →