Han Qi

8 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×7AI×4Robotics×1NLP×1Databases×1ML×1

Frequent co-authors

Zhan Qin5×

Kui Ren4×

Weiwei Qi2×

Tianhang Zheng2×

Rachel Luo1×

Michael Watson1×

Research Timeline

2026

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment during model adaptation.

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Semantic Camouflage (SSC) to maintain unlearnability.

Defense against Poisoning Attacks under Shuffle-DP

The paper proposes the first general defense framework to make all union-preserving Differential Privacy (DP) protocols, specifically those based on shuffle-DP, resilient against poisoning attacks.

LeakDojo: Decoding the Leakage Threats of RAG Systems

The paper introduces LeakDojo, a framework that systematically evaluates RAG leakage risks, finding that stronger LLM instruction-following and query generation are major independent contributors to data leakage.

Do Coding Agents Understand Least-Privilege Authorization?

The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization, and that this decomposition significantly improves both security and task success.

EVA: Editing for Versatile Alignment against Jailbreaks

The paper proposes EVA, a novel framework that uses direct model editing to surgically correct specific neurons responsible for jailbreaking vulnerabilities in LLMs and VLMs, achieving robust safety alignment without performance degradation.

TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking

The paper proposes TRACE, a novel agentic jailbreaking framework that successfully bypasses safety mechanisms of advanced LLM agents by decomposing malicious tasks and disguising harmful subtasks within task-aware, iteratively evolved scenarios.

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

Highlighted terms show continued research focus across papers

Papers

cs.RORecentJun 3, 2026

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

View →

cs.CRRecentMay 29, 2026