Yuanhao Liu
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Research Timeline
2026
Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off
The paper theorizes that aligned LLMs remain jailbreakable due to 'Refusal-Escape Directions' (RED), which are continuous perturbation paths that shift model behavior from refusal to answering, and shows this vulnerability is linked to specific operator-level sources within the model architecture.
Highlighted terms show continued research focus across papers