Donghua Zhang

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1AI×1Crypto×1

Frequent co-authors

Prakhar Gupta1×

Garv Shah1×

Research Timeline

2026

Self-Mined Hardness for Safety Fine-Tuning

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while maintaining usability.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CRRecentMay 4, 2026

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

View →