Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Satyapriya Krishna

Satyapriya Krishna

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

AI×1Crypto×1ML×1

Frequent co-authors

Jiacheng Liang1×
Yao Ma1×
Tharindu Kumarage1×
Rahul Gupta1×
Kai-Wei Chang1×
Aram Galstyan1×

Research Timeline

2026
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.CRcs.LGRecentApr 20, 2026

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

Jiacheng Liang, Yao Ma, Tharindu Kumarage, Satyapriya Krishna +4 more

ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial…

View →