Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Donghua Zhang

Donghua Zhang

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1AI×1Crypto×1

Frequent co-authors

Prakhar Gupta1×
Garv Shah1×

Research Timeline

2026
Self-Mined Hardness for Safety Fine-Tuning

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while maintaining usability.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CRRecentMay 4, 2026

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while mainta…

View →