Prakhar Gupta

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×2NLP×1ML×1Crypto×1

Frequent co-authors

Sohaib Imran1×

Jannes Elstner1×

David Demitri Africa1×

Garv Shah1×

Donghua Zhang1×

Research Timeline

2026

Self-Mined Hardness for Safety Fine-Tuning

The paper proposes a novel safety fine-tuning method that uses the target model's own rollouts to identify and train on the hardest prompts, significantly reducing jailbreak success rates while maintaining usability.

Consistency Training while Mitigating Obfuscation via Rate Matching

The paper introduces Rate Matching Consistency Training (RMCT), a novel method that improves model robustness against extraneous input cues without forcing the model to ignore those cues, thus preserving monitorability.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

Consistency Training while Mitigating Obfuscation via Rate Matching

Sohaib Imran, Prakhar Gupta, Jannes Elstner, David Demitri Africa

View →

cs.LGcs.AIcs.CRRecentMay 4, 2026