Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Junbin Yang

Junbin Yang

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1NLP×1Crypto×1

Frequent co-authors

Wenhao Lan1×
Shan Li1×
Xinhua Lai1×
Meiqi Wu1×
Haihua Shen1×
Yijun Yang1×

Research Timeline

2026
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal control carrier from late to early layers along a robustness-utility frontier.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.CLcs.CRRecentApr 29, 2026

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more

The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…

View →