Tharindu Kumarage
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.
PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of responses.
Papers
PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges
Swastik Roy, Rajkumar Pujari, Tharindu Kumarage, Charith Peris +4 more
PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of re…