Papers similar to 2606.02470

~ similar to 2606.02470· 20 results

cs.CRcs.AIRecentApr 7, 2026

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

The paper introduces MCPSHIELD, a comprehensive formal security framework that systematically characterizes and provides a defense-in-depth architecture for the rapidly adopted but insecure Model Cont…

View →

cs.CRcs.CLRecentApr 23, 2026

CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin +4 more

The paper introduces CI-Work, a benchmark demonstrating that current enterprise LLM agents frequently leak sensitive information while performing tasks, suggesting that privacy protection requires arc…

View →

cs.AIRecentMay 27, 2026

A Unified Framework for the Evaluation of LLM Agentic Capabilities

Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo +7 more

The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and…

View →

cs.CRcs.AIRecentMay 22, 2026

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

Shi Liu, Xuehai Tang, Xikang Yang, Liang Lin +3 more

This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defense…

View →

cs.CRcs.AIRecentApr 8, 2026

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan, Mohammad Ghasemigol +1 more

The paper introduces a defense-placement taxonomy for the Model Context Protocol (MCP) to systematically analyze security gaps, revealing that many vulnerabilities stem from architectural misalignment…

View →

cs.CRcs.SERecentMar 23, 2026

Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning

Charoes Huang, Xin Huang, Ngoc Phu Tran, Amin Milani Fard

This paper analyzes the security vulnerabilities of the Model Context Protocol (MCP), identifying tool poisoning as the most critical client-side threat, and proposes a multi-layered defense strategy.

View →

cs.CRcs.AIRecentApr 15, 2026

MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

MCPThreatHive is an open-source platform that automates the entire threat intelligence lifecycle for Model Context Protocol (MCP) agentic systems, addressing critical gaps in current security tooling.

View →

cs.CRRecentMay 15, 2026

PersonaFingerprint: Measuring Persona Inference on Modern Websites with LLM-Driven Browsing

Chuxu Song, Hao Wang, Richard Martin

This paper demonstrates that encrypted traffic metadata (packet lengths and timing) can leak a user's persona, achieving high inference accuracy across multiple modern websites.

View →

cs.AIRecentMay 27, 2026

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Yunqi Liu, Tong Niu, Zitong Wang, Zhenlong Dai +3 more

The paper introduces EgoBench, the first interactive multimodal benchmark designed to jointly evaluate advanced AI agents' capabilities in visual perception, multi-hop reasoning, and dynamic tool usag…

View →

cs.CRcs.AIcs.SERecentJun 3, 2026

Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications

Yutao Shi, Xiaohan Zhang, Xiangjing Zhang, Xihua Shen +4 more

This paper investigates Description-Code Inconsistency (DCI) in Model Context Protocol (MCP) servers, finding that 9.93% of real-world tools exhibit inconsistencies that create security blind spots.

View →

cs.CRcs.AIcs.LGRecentMar 28, 2026

Sovereign Context Protocol: An Open Attribution Layer for Human-Generated Content in the Age of Large Language Models

Praneel Panchigar, Torlach Rush, Matthew Canabarro

The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…

View →

cs.CLRecentJun 1, 2026

CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation

Danqing Wang, Akshay Sivaraman, Lei Li

The paper introduces CRAB-Bench and RUSE, a rigorous evaluation framework that tests LLM agents on complex, interdependent tasks with realistic human user interactions, revealing significant performan…

View →

cs.AIcs.CLRecentMay 28, 2026

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Lorenz Kutschka, Bernhard Geiger

This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…

View →

cs.AIcs.CLcs.CYRecentJun 1, 2026

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji +3 more

SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expand…

View →

cs.AIRecentJun 1, 2026

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu +11 more

The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-…

View →

cs.CLRecentJun 1, 2026

Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization

Liang Wang, Xinyi Mou, Xiaoyou Liu, Tiannan Wang +2 more

The paper proposes a hierarchical framework, PHF (Practice-Habitus-Field), inspired by Bourdieu's Theory of Practice, to improve LLM personalization by modeling user behaviors at three distinct levels…

View →

cs.CLcs.AIRecentMay 28, 2026

Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment

Ruoxi Su, Yuhan Liu, Jingyu Hu

The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…

View →

cs.AIRecentMay 28, 2026

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more

The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…

View →

cs.CRRecentMay 21, 2026

A First Measurement Study on Authentication Security in Real-World Remote MCP Servers

Huijun Zhou, Xiaohan Zhang, Haozhe Zhang, Haoyang Zhang +2 more

This study provides the first measurement of authentication security in real-world remote Model Context Protocol (MCP) servers, finding pervasive and critical authentication weaknesses, particularly i…

View →

cs.SEcs.AIRecentMay 27, 2026

DeltaMCP: Incremental Regeneration via Spec-Aware Transformation for MCP servers

Aditya Pujara, Xiaogang Zhu, Hsiang-Ting Chen

DeltaMCP is a specification-aware, incremental regeneration tool that efficiently updates Model Context Protocol (MCP) servers by only modifying affected tooling when a service's OpenAPI specification…

View →