Xiaohao Cai

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1Crypto×1

Frequent co-authors

Bochen Lyu1×

Yiyang Jia1×

Zhanxing Zhu1×

Research Timeline

2026

When Autoregressive Consistency Hurts Safety Alignment

The paper argues that shallow safety alignment in LLMs is due to autoregressive consistency, a mechanism that allows small harmful inputs to redirect the model's generation to unsafe outputs, necessitating adversarial safety training.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.CRRecentJun 2, 2026

When Autoregressive Consistency Hurts Safety Alignment

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

View →