Zhi Wang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluation.
The paper introduces Babel, an efficient black-box attack framework that systematically exploits intrinsic safety gaps in LLMs by optimizing text obfuscation sampling, achieving state-of-the-art jailbreak success rates on commercial models.
The paper introduces CORDON-MAS, a compartmentalized framework that defends Retrieval-Augmented Generation (RAG) against knowledge poisoning by enforcing strict information-flow control, significantly reducing attack success rates.
This study finds that when users do not specify a jurisdiction, the language used in the prompt strongly biases the LLM's response toward a specific national legal framework (U.S. for English, China for Mandarin Chinese), creating a risk of institutional misselection.
This paper introduces CHERRL, a controllable hacking environment for rubric-based reinforcement learning to study and mitigate reward hacking.
Papers
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Xuekang Wang, Zhuoyuan Hao, Shuo Hou, Hao Peng +2 more
This paper introduces CHERRL, a controllable hacking environment for rubric-based reinforcement learning to study and mitigate reward hacking.