~ similar to 2606.02470· 20 results
The paper introduces MCPSHIELD, a comprehensive formal security framework that systematically characterizes and provides a defense-in-depth architecture for the rapidly adopted but insecure Model Cont…
Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin +4 more
The paper introduces CI-Work, a benchmark demonstrating that current enterprise LLM agents frequently leak sensitive information while performing tasks, suggesting that privacy protection requires arc…
Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo +7 more
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and…
Shi Liu, Xuehai Tang, Xikang Yang, Liang Lin +3 more
This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defense…
The paper introduces a defense-placement taxonomy for the Model Context Protocol (MCP) to systematically analyze security gaps, revealing that many vulnerabilities stem from architectural misalignment…
This paper analyzes the security vulnerabilities of the Model Context Protocol (MCP), identifying tool poisoning as the most critical client-side threat, and proposes a multi-layered defense strategy.
MCPThreatHive is an open-source platform that automates the entire threat intelligence lifecycle for Model Context Protocol (MCP) agentic systems, addressing critical gaps in current security tooling.
This paper demonstrates that encrypted traffic metadata (packet lengths and timing) can leak a user's persona, achieving high inference accuracy across multiple modern websites.
Yunqi Liu, Tong Niu, Zitong Wang, Zhenlong Dai +3 more
The paper introduces EgoBench, the first interactive multimodal benchmark designed to jointly evaluate advanced AI agents' capabilities in visual perception, multi-hop reasoning, and dynamic tool usag…
Yutao Shi, Xiaohan Zhang, Xiangjing Zhang, Xihua Shen +4 more
This paper investigates Description-Code Inconsistency (DCI) in Model Context Protocol (MCP) servers, finding that 9.93% of real-world tools exhibit inconsistencies that create security blind spots.
The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…
The paper introduces CRAB-Bench and RUSE, a rigorous evaluation framework that tests LLM agents on complex, interdependent tasks with realistic human user interactions, revealing significant performan…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji +3 more
SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expand…
Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu +11 more
The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-…
Liang Wang, Xinyi Mou, Xiaoyou Liu, Tiannan Wang +2 more
The paper proposes a hierarchical framework, PHF (Practice-Habitus-Field), inspired by Bourdieu's Theory of Practice, to improve LLM personalization by modeling user behaviors at three distinct levels…
The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…
Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more
The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…
Huijun Zhou, Xiaohan Zhang, Haozhe Zhang, Haoyang Zhang +2 more
This study provides the first measurement of authentication security in real-world remote Model Context Protocol (MCP) servers, finding pervasive and critical authentication weaknesses, particularly i…
DeltaMCP is a specification-aware, incremental regeneration tool that efficiently updates Model Context Protocol (MCP) servers by only modifying affected tooling when a service's OpenAPI specification…