SecGoal: A Benchmark for Extracting Formalizable Security Goals from Protocol Documents
The paper introduces SecGoal, a benchmark dataset and framework, demonstrating that fine-tuning smaller LLMs on this dataset significantly improves the precision of extracting formalizable security goals from natural language protocol documents.
Abstract
More Like ThisFormal verification provides rigorous guarantees for cryptographic security, yet extracting formalizable security goals from natural-language protocol documents remains largely manual. We introduce SecGoal, a dedicated expert-annotated dataset and benchmark for extracting formalizable security goal statements from protocol documents, covering 15 widely deployed protocols, together with AIFG, a schema- and flow-conditioned framework for structured formal security property generation. Our evaluation shows that frontier and large LLMs achieve high property recall but low extraction precision because they often fail to distinguish formalizable security goals from non-goal protocol content. In contrast, SecGoal fine-tuning makes smaller open-source LLMs substantially more selective extractors of formalizable security goals. On the held-out test protocols, Gemma2-9B-FT improves extraction precision from 24.0\% to 66.6\% and reaches 97.6\% property recall, outperforming larger prompted LLMs and encoder baselines. In a controlled setting, AIFG shows that concise goal inputs can support high-recall structured property generation, while expert-vetted extracted inputs reveal over-generation as the main remaining bottleneck. Together, SecGoal and AIFG provide a dataset, benchmark, and framework for specification-grounded security goal extraction and property generation.