Peijie Sun
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Research Timeline
2026
Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
The paper introduces Safety Bottleneck Regularization (SBR), a novel defense mechanism that anchors LLM safety by constraining the unembedding layer, effectively preventing harmful fine-tuning (HFT) even when other defenses fail.
Highlighted terms show continued research focus across papers
Papers
cs.CRcs.AIcs.CLRecentMay 7, 2026
Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun +3 more
The paper introduces Safety Bottleneck Regularization (SBR), a novel defense mechanism that anchors LLM safety by constraining the unembedding layer, effectively preventing harmful fine-tuning (HFT) e…
View →