Yuxi Zhou
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1
Frequent co-authors
Research Timeline
2026
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization
The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining utility.
Highlighted terms show continued research focus across papers
Papers
cs.AIRecentMay 28, 2026
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization
Zhihao Liu, Yifan Wu, Jian Lou, Di Wang +2 more
The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining u…
View →