Xianglin Yang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exploring the resulting Semantic Policy Graph.
The paper introduces BITE, a black-box adversarial framework that exploits stylistic biases in LLM judges by adaptively generating semantically equivalent edits to artificially inflate assigned scores.
Papers
Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications
Xiaoyue Lu, Xianglin Yang, Haijun Liu, Jiahao Liu +3 more
The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exp…