Rahul Gupta

4 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×3NLP×2Crypto×2Society×1ML×1

Frequent co-authors

Yuting Ning1×

Research Timeline

2026

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.

SWAN: Semantic Watermarking with Abstract Meaning Representation

SWAN introduces a novel, training-free framework that embeds watermarks directly into the semantic structure of a sentence using Abstract Meaning Representation (AMR), achieving superior robustness against paraphrasing compared to existing methods.

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of responses.

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

The paper introduces SkillHarm, a comprehensive benchmark and automated framework for evaluating skill-based attacks across the entire agent skill-use lifecycle, demonstrating that current agents remain highly vulnerable to both fixed-payload and self-mutating poisoning attacks.

Highlighted terms show continued research focus across papers

Papers

cs.CLRecentJun 1, 2026

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou +7 more

View →

cs.AIRecentMay 29, 2026