Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems | ArxivCSExplorer