~ similar to 2606.02674v1· 20 results
The paper introduces CSTM-Bench, a comprehensive benchmark and evaluation framework demonstrating that standard session-bound AI guardrails fail against sophisticated, cross-session attacks that accum…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
The paper introduces SecLens-R, a multi-stakeholder evaluation framework, demonstrating that LLM performance for vulnerability detection varies significantly depending on the specific priorities (e.g.…
The paper introduces MolTrust, a production-deployed trust infrastructure built on W3C standards (VCs and DIDs) that provides a verifiable, multi-layered authorization framework for autonomous AI agen…
Linfeng Fan, Ziwei Li, Yuan Tian, Yichen Wang +2 more
The paper introduces PACT, a provenance-aware runtime monitor that enhances agent security by tracking the origin and trust of individual tool arguments, solving the granularity mismatch in LLM agent…
Minghui Xu, Xiaoyu Liu, Yihao Guo, Chunchi Liu +2 more
The paper proposes AgentDID, a decentralized framework using DIDs and verifiable credentials to provide trustless identity authentication and dynamic state verification for autonomous, self-managed AI…
The paper proposes an end-to-end LLM framework that automates SOC operations by integrating ensemble-based threat detection, syntax-constrained query generation, and evidence-grounded incident resolut…
Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo +7 more
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and…
Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more
The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…
Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin +4 more
The paper introduces CI-Work, a benchmark demonstrating that current enterprise LLM agents frequently leak sensitive information while performing tasks, suggesting that privacy protection requires arc…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…
Huijun Zhou, Xiaohan Zhang, Haozhe Zhang, Haoyang Zhang +2 more
This study provides the first measurement of authentication security in real-world remote Model Context Protocol (MCP) servers, finding pervasive and critical authentication weaknesses, particularly i…
Chenning Li, Pan Hu, Justin Xu, Baris Ozbas +8 more
The paper introduces ADR, a novel, production-proven detection system that provides high-fidelity security monitoring for AI agents operating via the Model Context Protocol, significantly outperformin…
The paper introduces AIP, a novel protocol using Invocation-Bound Capability Tokens (IBCTs) to provide verifiable identity and secure delegation across Model Context Protocol (MCP) and Agent-to-Agent…
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu +7 more
The paper introduces SafeHarness, a novel, lifecycle-integrated security architecture that significantly reduces unsafe behavior and attack success rates in LLM agents by weaving multiple defense laye…
The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…
The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…
The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…