Xu Wang
10 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
PlanTwin introduces a privacy-preserving architecture that allows cloud-hosted LLMs to plan over sensitive local environments by projecting the raw state into a sanitized, abstract digital twin.
ClawKeeper is a comprehensive, multi-layered security framework designed to mitigate critical vulnerabilities in autonomous agent runtimes like OpenClaw by enforcing protection across skills, plugins, and system state.
This paper systematizes the security challenges of open agentic systems, concluding that while attack characterization is mature, the field lacks robust guidelines for operational governance, memory integrity, and capability revocation.
This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust, and reliable real-world agents.
The paper proposes the first general defense framework to make all union-preserving Differential Privacy (DP) protocols, specifically those based on shuffle-DP, resilient against poisoning attacks.
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulnerability across diverse foundation models.
The paper introduces Knowledge-Intensive Video Generation (KIVI) as a challenging benchmark for evaluating video models on factuality and practical usefulness, showing that current state-of-the-art systems still struggle to match human performance.
BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.
BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and realistic, multi-step execution trajectories.
SentGuard introduces a novel sentence-level streaming guardrail that operates in parallel with LLM generation, achieving high detection rates of unsafe content early in the response while maintaining low false-positive rates.
Papers
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
SentGuard introduces a novel sentence-level streaming guardrail that operates in parallel with LLM generation, achieving high detection rates of unsafe content early in the response while maintaining…