Manik Bhandari
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
NLP×1
Frequent co-authors
Research Timeline
2026
Configurable Reward Model for Balanced Safety Alignment
The paper introduces the Configurable Safety Reward Model (CSRM), a novel reward model that can be jointly optimized for calibrated safety compliance and reward modeling, significantly improving LLM safety alignment across diverse and unseen safety configurations.
Highlighted terms show continued research focus across papers