Yixu Wang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust, and reliable real-world agents.
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulnerability across diverse foundation models.
BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.
BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and realistic, multi-step execution trajectories.
SentGuard introduces a novel sentence-level streaming guardrail that operates in parallel with LLM generation, achieving high detection rates of unsafe content early in the response while maintaining low false-positive rates.
Papers
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
SentGuard introduces a novel sentence-level streaming guardrail that operates in parallel with LLM generation, achieving high detection rates of unsafe content early in the response while maintaining…