Yuliang Chen
2 indexed papers
Research Timeline
CoopGuard is a novel stateful, multi-round defense framework using cooperative agents to significantly reduce the success rate of evolving adversarial attacks against Large Language Models.
The paper introduces EditRisk-Bench, a novel benchmark designed to systematically evaluate the safety risks and downstream reasoning corruption caused by malicious knowledge editing in large language models.
Papers
Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing
Qinghua Mao, Xi Lin, Jinze Gu, Jun Wu +2 more
The paper introduces EditRisk-Bench, a novel benchmark designed to systematically evaluate the safety risks and downstream reasoning corruption caused by malicious knowledge editing in large language…