Yaodong Yang

5 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×3AI×3Crypto×2Society×1ML×1

Frequent co-authors

Jiaming Ji3×

Juntao Dai3×

Tianzhuo Yang2×

Yuyan Bu1×

Haowei Li1×

Qirui Zheng1×

Research Timeline

2026

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

TwinGate introduces a stateful dual-encoder defense framework using Asymmetric Contrastive Learning to detect malicious intent from fragmented, untraceable LLM queries with high recall and low false positives.

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a pervasive issue across current systems.

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical safety issue for autonomous LLM agents.

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expanding action spaces.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong +6 more

View →

cs.AIcs.CLcs.CYRecentJun 1, 2026