Meiqi Wu
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1NLP×1Crypto×1
Frequent co-authors
Research Timeline
2026
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal control carrier from late to early layers along a robustness-utility frontier.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.CLcs.CRRecentApr 29, 2026
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more
The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…
View →