Xin Yao
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces SafeRedirect, a system-level defense that prevents frontier LLMs from generating harmful content during legitimate tasks that structurally require it, significantly reducing unsafe generation rates.
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmful safety violations.
Papers
SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces
Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…