Aram Galstyan
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.
SWAN introduces a novel, training-free framework that embeds watermarks directly into the semantic structure of a sentence using Abstract Meaning Representation (AMR), achieving superior robustness against paraphrasing compared to existing methods.
Papers
SWAN: Semantic Watermarking with Abstract Meaning Representation
SWAN introduces a novel, training-free framework that embeds watermarks directly into the semantic structure of a sentence using Abstract Meaning Representation (AMR), achieving superior robustness ag…