Minlie Huang

5 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×5Crypto×4ML×3NLP×2Vision×2

Frequent co-authors

Wenjie Wang3×

Qi Zhu2×

Fei Mi2×

Hongning Wang2×

Dongrui Liu2×

Yu Li2×

Research Timeline

2026

EVA: Editing for Versatile Alignment against Jailbreaks

The paper proposes EVA, a novel framework that uses direct model editing to surgically correct specific neurons responsible for jailbreaking vulnerabilities in LLMs and VLMs, achieving robust safety alignment without performance degradation.

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from task execution traces, leading to improved agent performance.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CRRecentJun 2, 2026

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more

The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.

View →

cs.AIcs.CLcs.CRRecentMay 28, 2026