Satyapriya Krishna
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1Crypto×1ML×1
Frequent co-authors
Research Timeline
2026
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.
Highlighted terms show continued research focus across papers
Papers
cs.AIcs.CRcs.LGRecentApr 20, 2026
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
Jiacheng Liang, Yao Ma, Tharindu Kumarage, Satyapriya Krishna +4 more
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial…
View →