Yue Zhao
10 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
Defense training for LLM agents, intended to improve safety, systematically degrades their core competence, leading to unreliability in multi-step tasks.
The paper introduces Cross-Model Neuron Transfer (CNT), a post-hoc method that efficiently transfers safety-oriented functionalities between different large language models by transferring minimal subsets of neurons, achieving high performance with minimal degradation.
Agent Audit is a novel security analysis system that comprehensively audits LLM agent applications by examining the entire software stack—including tool code, configuration, and prompts—to detect a wide range of vulnerabilities.
The paper proposes an infrastructure, clawgang and meowtrade, to transform private, non-transferable agent memories into verifiable, tradable economic commodities.
This paper identifies and analyzes unintentional cross-user contamination (UCC), a failure mode where benign, scope-bound artifacts degrade the outcomes of different users in shared-state LLM agents, requiring artifact-level defenses.
The paper introduces Latent Policy Guardrail (LPG), a novel framework that efficiently enforces dynamic safety policies for LLMs by compressing complex policy deliberation into a small set of latent tokens, achieving high accuracy with significantly reduced latency.
GEO-Bench introduces a standardized benchmark to compare various ranking manipulation attacks (both black-box and white-box) on generative engines, demonstrating that black-box content rewriting can be highly effective and stealthy.
The paper introduces OR-Space, a novel full-lifecycle workspace benchmark designed to rigorously evaluate industrial optimization agents by simulating real-world, multi-stage OR workflows that go beyond simple model translation.
The paper introduces GEO-Bench, a unified benchmark that standardizes the evaluation of various generative engine optimization (GEO) ranking manipulation attacks, demonstrating that black-box content rewriting can be highly effective and stealthy compared to gradient-based methods.
MaskForge is a novel, adaptive, black-box attack framework that significantly improves jailbreaking diffusion large language models (dLLMs) by treating red-teaming as an optimized search over reusable structural patterns.
Papers
MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models
Yingzi Ma, Zhengyue Zhao, Xiaogeng Liu, Minhui Xue +2 more
MaskForge is a novel, adaptive, black-box attack framework that significantly improves jailbreaking diffusion large language models (dLLMs) by treating red-teaming as an optimized search over reusable…