Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Prakhar Gupta

Prakhar Gupta

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

AI×2NLP×1ML×1Crypto×1

Frequent co-authors

Sohaib Imran1×
Jannes Elstner1×
David Demitri Africa1×
Garv Shah1×
Donghua Zhang1×

Research Timeline

2026
Self-Mined Hardness for Safety Fine-Tuning

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while maintaining usability.

Consistency Training while Mitigating Obfuscation via Rate Matching

The paper introduces Rate Matching Consistency Training (RMCT), a novel method that improves model robustness against extraneous input cues without forcing the model to ignore those cues, thus preserving monitorability.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

Consistency Training while Mitigating Obfuscation via Rate Matching

Sohaib Imran, Prakhar Gupta, Jannes Elstner, David Demitri Africa

The paper introduces Rate Matching Consistency Training (RMCT), a novel method that improves model robustness against extraneous input cues without forcing the model to ignore those cues, thus preserv…

View →
cs.LGcs.AIcs.CRRecentMay 4, 2026

Self-Mined Hardness for Safety Fine-Tuning

Prakhar Gupta, Garv Shah, Donghua Zhang

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while mainta…

View →