~ similar to 2605.30604· 20 results
The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…
Qian'ang Mao, Jiaxin Wang, Ya Liu, Li Zhu +2 more
The paper develops a unified, cross-layer security framework for autonomous LLM agents operating in agentic commerce, identifying key attack vectors and proposing a layered defense architecture.
The paper proposes an end-to-end LLM framework that automates SOC operations by integrating ensemble-based threat detection, syntax-constrained query generation, and evidence-grounded incident resolut…
Agent Audit is a novel security analysis system that comprehensively audits LLM agent applications by examining the entire software stack—including tool code, configuration, and prompts—to detect a wi…
Red-MIRROR is a novel multi-agent LLM system that automates complex web penetration testing by integrating a memory-reflection backbone, achieving superior performance on industry benchmarks.
The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…
Minfeng Qi, Tianqing Zhu, Zijie Xu, Congcong Zhu +2 more
The paper introduces CAESAR, a novel multi-agent framework that coordinates LLM agents across five specialized roles to improve success rates and stability in complex, multi-stage cyber intrusion task…
The paper proposes Proof-Carrying Agent Actions (PCAA), a runtime-neutral governance model that uses action certificates to consistently track and authorize high-risk actions across diverse and hetero…
LanG is a governance-aware, open-source agentic AI platform that unifies security operations by providing advanced correlation, automated rule generation, and attack reconstruction capabilities.
The paper proposes Dynamic Cyber Ranges, an advanced cyber range environment using LLM-driven Defender agents to counter the saturation of traditional security benchmarks, demonstrating that these dyn…
The paper proposes CyberAId, a hybrid multi-agent system designed to enhance cybersecurity for financial institutions by integrating specialized LLM subagents with existing SIEM/XDR telemetry, address…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala +3 more
The paper introduces Semantic Intent Fragmentation (SIF), an attack class demonstrating that multi-agent AI orchestrators can violate security policies through a composition of individually benign sub…
The paper introduces a comprehensive security framework, AgentRFC, to systematically analyze and test the security conformance of various AI agent protocols, identifying critical design gaps, especial…
Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…
The paper introduces an agentic workflow that uses large language models (LLMs) combined with structured querying and constrained tools to automate and significantly improve the accuracy of initial se…
Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more
The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.
This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…
SOCpilot is a system that verifies the compliance of LLM-drafted incident response plans against mandatory policies and required procedural steps, significantly improving the reliability of AI-assiste…
The paper introduces MolTrust, a production-deployed trust infrastructure built on W3C standards (VCs and DIDs) that provides a verifiable, multi-layered authorization framework for autonomous AI agen…