~ similar to 2605.09480v1· 20 results
The paper introduces a Contextual Integrity (CI) framework and a new benchmark (DelegateCI-Bench) to rewrite user queries sent to cloud LLMs, ensuring only task-essential information is retained while…
The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…
Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more
The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
Shidong Pan, Xiaoyu Sun, Tianyi Zhang, Dianshu Liao +2 more
SkillGuard introduces a novel, skill-centric permission framework to secure LLM agent skill ecosystems by jointly regulating both context influence and runtime action side effects.
The paper introduces a 'Privacy Guard' framework that simultaneously reduces operational costs and eliminates data leakage risks when using LLMs by optimizing prompts and routing queries to secure mod…
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more
This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…
The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…
The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go +4 more
The paper introduces GrantBox, a new security sandbox that evaluates how well LLM agents handle real-world tool privileges, finding that agents remain highly vulnerable to sophisticated attacks.
Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan +1 more
The paper proposes a character-level differential privacy mechanism to sanitize sensitive user prompts for LLMs, achieving high privacy for PII while maintaining utility for non-sensitive context.
The paper introduces ActInv and PAF to systematically analyze and quantify privacy leakage from intermediate activations during split inference of LLMs, proposing PriPert for enhanced defense.
Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li +6 more
This paper proposes a comprehensive taxonomy (SLOT) to systematically categorize security risks, attacks, and defenses specific to Retrieval-Augmented Generation (RAG), clarifying that these risks are…
Ying Li, Yanju Chen, Peiran Wang, Issac Khabra +3 more
The paper introduces Conleash, a client-side middleware that uses a risk lattice to enforce granular, boundary-scoped authorization for tool invocations, significantly improving user consent and secur…
This paper introduces EXTree, a novel structure for Attribute-based Access Control (ABAC) policies that optimizes for both fast evaluation and human-understandable explanations when access is denied.
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
BodhiPromptShield is a policy-aware framework that mediates prompt privacy by detecting sensitive data and replacing it with secure placeholders across multiple stages (retrieval, memory, tools) to pr…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…