Chao Shen
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, particularly in generating unsafe content and creating harder-to-detect fake images compared to traditional diffusion models.
TEMPLATEFUZZ is a fine-grained fuzzing framework that systematically tests chat templates to find vulnerabilities in LLMs, achieving high jailbreak success rates with minimal performance degradation.
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and safety defenses are highly vulnerable to subtle, compositionally unsafe prompts.
The paper introduces MGTEVAL, a comprehensive and extensible platform designed to systematically evaluate the performance, robustness, and efficiency of machine-generated text detectors.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.
SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.
SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.
Papers
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents
Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more
SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe ac…