Zeming Wei
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Salami Slicing Risk, a novel multi-turn jailbreak technique that accumulates harmful intent through numerous low-risk inputs, achieving state-of-the-art attack success rates against major LLMs.
The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rather than just malicious user prompts.
Papers
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more
The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…