Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Xiaohao Cai

Xiaohao Cai

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1Crypto×1

Frequent co-authors

Bochen Lyu1×
Yiyang Jia1×
Zhanxing Zhu1×

Research Timeline

2026
When Autoregressive Consistency Hurts Safety Alignment

The paper argues that shallow safety alignment in LLMs is due to autoregressive consistency, a mechanism that allows small harmful inputs to redirect the model's generation to unsafe outputs, necessitating adversarial safety training.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.CRRecentJun 2, 2026

When Autoregressive Consistency Hurts Safety Alignment

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

The paper argues that shallow safety alignment in LLMs is due to autoregressive consistency, a mechanism that allows small harmful inputs to redirect the model's generation to unsafe outputs, necessit…

View →