Tianhang Zheng

7 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×7ML×5AI×3NLP×3Vision×2Architecture×1

Frequent co-authors

Chaochao Lu3×

Kui Ren3×

Weiwei Qi2×

Zhan Qin2×

Dongrui Liu2×

Yu Li2×

Research Timeline

2026

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment during model adaptation.

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

This paper reinterprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a weak backdoor mechanism, proposing backdoor-inspired strategies to mitigate this generalization failure.

Mitigating Error Amplification in Fast Adversarial Training

The paper proposes a Distribution-aware Dynamic Guidance (DDG) strategy to mitigate catastrophic overfitting and the robustness-accuracy trade-off inherent in Fast Adversarial Training (FAT) by dynamically adjusting perturbation budgets and supervision signals based on sample confidence.

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

RouteScan introduces a non-intrusive framework that audits the safety of Mixture-of-Experts (MoE) LLMs by analyzing low-level GPU expert routing telemetry, achieving high accuracy even on unseen harmful prompts.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking

The paper proposes TRACE, a novel agentic jailbreaking framework that successfully bypasses safety mechanisms of advanced LLM agents by decomposing malicious tasks and disguising harmful subtasks within task-aware, iteratively evolved scenarios.

Highlighted terms show continued research focus across papers

Papers

cs.CRRecentMay 29, 2026

TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking

Churui Zeng, Weiwei Qi, Kedong Xiu, Tianhang Zheng +4 more

View →

cs.AIcs.CLcs.CRRecentMay 28, 2026