Kui Ren

10 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×9ML×3NLP×2AI×2Architecture×1Databases×1Sound×1

Frequent co-authors

Zhan Qin4×

Tianhang Zheng3×

Zhibo Wang3×

Weiwei Qi2×

Yaopeng Wang2×

Huiyu Xu2×

Research Timeline

2026

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

STEP introduces a novel, black-box, retraining-free detector that profiles audio samples using dual perturbation branches to detect backdoor attacks by exploiting the characteristic instability of hidden triggers.

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

The paper proposes the Expected Safety Impact (ESI) framework to identify safety-critical parameters in LLMs, introducing targeted tuning methods (SET and SPA) to enhance safety and preserve alignment during model adaptation.

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Semantic Camouflage (SSC) to maintain unlearnability.

Defense against Poisoning Attacks under Shuffle-DP

The paper proposes the first general defense framework to make all union-preserving Differential Privacy (DP) protocols, specifically those based on shuffle-DP, resilient against poisoning attacks.

LoopTrap: Termination Poisoning Attacks on LLM Agents

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking

The paper proposes W-IR, a novel watermarking framework that simultaneously achieves high certified robustness against adversarial attacks and effectively mitigates identity leakage in watermarked images.

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

RouteScan introduces a non-intrusive framework that audits the safety of Mixture-of-Experts (MoE) LLMs by analyzing low-level GPU expert routing telemetry, achieving high accuracy even on unseen harmful prompts.

LoRA-Key: User-Centric LoRA Watermarking for Text-to-Image Diffusion Models

LoRA-Key introduces a user-centric watermarking framework that attaches a recoverable ownership key to LoRA modules via a standalone Watermark LoRA, providing lightweight, plug-and-play copyright protection without requiring per-LoRA retraining.

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated into the final safety decision.

TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking

The paper proposes TRACE, a novel agentic jailbreaking framework that successfully bypasses safety mechanisms of advanced LLM agents by decomposing malicious tasks and disguising harmful subtasks within task-aware, iteratively evolved scenarios.

Highlighted terms show continued research focus across papers

Papers

cs.CLRecentMay 29, 2026

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more

View →

cs.CRRecentMay 29, 2026