Wei Zhang
24 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
MirageBackdoor introduces a novel, highly stealthy backdoor attack that forces Large Language Models to generate correct reasoning steps (Think Well) but output an incorrect final answer (Answer Wrong), bypassing existing detection methods.
The paper proposes efficient Fuzzy Private Set Intersection (FPSI) protocols for various $L_p$ distance metrics by leveraging symmetric-key operations, achieving linear complexity and significantly outperforming existing state-of-the-art methods.
The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models can be hijacked with high success rates.
SkCC is a compiler that enables portable and secure development of LLM agent skills by decoupling skill semantics from framework-specific formatting, significantly improving reliability and security.
The paper proposes WARDEN, a distributionally robust adversarial training framework that significantly reduces LLM vulnerability to adversarial attacks by dynamically reweighting hard adversarial examples within a divergence ball.
The paper introduces LeakDojo, a framework that systematically evaluates RAG leakage risks, finding that stronger LLM instruction-following and query generation are major independent contributors to data leakage.
The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.
The paper introduces TESLA, a novel, contactless electromagnetic (EM) side-channel attack that exploits inherent EM emanations from capacitive touchscreens to extract highly sensitive user data like PIN codes and keystrokes.
The paper demonstrates that deep-research agents are vulnerable to poisoning attacks where an adversary can inject malicious content into a single, frequently retrieved user-generated page to compromise the agent's output across multiple related queries.
The paper introduces BITE, a black-box adversarial framework that exploits stylistic biases in LLM judges by adaptively generating semantically equivalent edits to artificially inflate assigned scores.
MemMorph introduces a novel memory poisoning attack that biases LLM agent tool selection by injecting crafted records into the agent's long-term memory, achieving high success rates even against modern defenses.
The paper proposes Fasco, a lightweight confidential container runtime utilizing ARM CCA to significantly reduce startup latency and resource overhead compared to existing microVM-based confidential container architectures.
The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding that updating capabilities are surprisingly uniform while benefit is maximized in mid-tier models.
The paper introduces Agora, a domain-aware multi-agent framework that successfully detects deep, previously unknown logic bugs in complex consensus protocols, outperforming existing LLM-based analysis methods.
CRITIC-R1 introduces a structured critic framework that treats RAG critique as an explicit error diagnosis problem using reinforcement learning, significantly improving answer quality over strong RAG baselines.
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural scaffolding and error-survival confounds.
The paper introduces PassNet, a large-scale ecosystem for generating compiler passes using LLMs, demonstrating that LLMs can significantly accelerate graph compilation for long-tail workloads, suggesting that consistency is the primary bottleneck.
The paper proposes a feasible-reward-set framework to perform Inverse Reinforcement Learning (IRL) when data comes from multiple imperfect demonstrators, providing theoretical guarantees and practical algorithms.
The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.
Papers
CORE-Bench: A Comprehensive Benchmark for Code Retrieval in the Era of Agentic Coding
Fuwei Zhang, Yanzhao Zhang, Mingxin Li, Dingkun Long +4 more
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.