Han Qi
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment during model adaptation.
This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Semantic Camouflage (SSC) to maintain unlearnability.
The paper proposes the first general defense framework to make all union-preserving Differential Privacy (DP) protocols, specifically those based on shuffle-DP, resilient against poisoning attacks.
The paper introduces LeakDojo, a framework that systematically evaluates RAG leakage risks, finding that stronger LLM instruction-following and query generation are major independent contributors to data leakage.
The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization, and that this decomposition significantly improves both security and task success.
The paper proposes EVA, a novel framework that uses direct model editing to surgically correct specific neurons responsible for jailbreaking vulnerabilities in LLMs and VLMs, achieving robust safety alignment without performance degradation.
The paper proposes TRACE, a novel agentic jailbreaking framework that successfully bypasses safety mechanisms of advanced LLM agents by decomposing malicious tasks and disguising harmful subtasks within task-aware, iteratively evolved scenarios.
This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.
Papers
X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation
Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more
This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.