Yaodong Yang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility.
TwinGate introduces a stateful dual-encoder defense framework using Asymmetric Contrastive Learning to detect malicious intent from fragmented, untraceable LLM queries with high recall and low false positives.
The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a pervasive issue across current systems.
The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical safety issue for autonomous LLM agents.
SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expanding action spaces.
Papers
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence
Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong +6 more
The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical…