Minlie Huang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes EVA, a novel framework that uses direct model editing to surgically correct specific neurons responsible for jailbreaking vulnerabilities in LLMs and VLMs, achieving robust safety alignment without performance degradation.
The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from task execution traces, leading to improved agent performance.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.
The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.
Papers
RUBAS: Rubric-Based Reinforcement Learning for Agent Safety
Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more
The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.