Wenhan Chang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes Two-stage Backdoor Hijacking (TSBH) to create persistent, trigger-activated malicious behaviors by manipulating the observable Chain-of-Thought (CoT) process in Large Language Models.
The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks, significantly reducing attack success rates.
Papers
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more
The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…