Yang Cao
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces the Black-Hole Attack, a poisoning vulnerability that exploits geometric defects in high-dimensional embedding spaces to force malicious vectors into the top-k results of vector database queries.
The paper proposes Multi-Adapter Representation Interventions via Energy Calibration (MARI), a method that adaptively adjusts the strength and direction of interventions across different inputs to improve alignment without degrading general model capabilities.
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small set of paired anchors.
SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.
SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.
Papers
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents
Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more
SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe ac…