Haoxuan Ji
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces an embedding disruption method to re-activate and strengthen built-in safeguards within LLMs, effectively detecting and defending against sophisticated jailbreak attacks.
The paper introduces Disrupt-and-Rectify Smoothing (DR-Smoothing), a novel two-stage defense mechanism that significantly improves LLM security against jailbreaking attacks by restoring disrupted inputs to a safe, in-distribution form.
Papers
Re-Triggering Safeguards within LLMs for Jailbreak Detection
Zheng Lin, Zhenxing Niu, Haoxuan Ji, Yuzhe Huang +1 more
The paper introduces an embedding disruption method to re-activate and strengthen built-in safeguards within LLMs, effectively detecting and defending against sophisticated jailbreak attacks.