Zihao Wang
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
LocalAlign proposes a generalizable prompt injection defense by generating near-target adversarial examples, which enforces a tighter robustness boundary around the correct model response.
Misrouter introduces an input-only adversarial framework to exploit the routing mechanisms of Mixture-of-Experts (MoE) LLMs, enabling unsafe behavior induction against remotely hosted, black-box services.
The paper proposes DP-SelFT, a novel framework for differentially private selective fine-tuning that significantly improves the privacy-utility trade-off for LLMs by intelligently selecting robust parameter subsets.
The paper introduces SCAgent, an automated framework that uses LLM-assisted agents to systematically discover, analyze, and assess side-channel leakage risks in complex systems like iOS, moving beyond manual and predefined analysis.
PatchWorld introduces a gradient-free framework to create executable Python world models from offline trajectories, achieving high planning scores by inducing symbolic belief-state programs.
Papers
PatchWorld: Gradient-Free Optimization of Executable World Models
Jiaxin Bai, Yue Guo, Yifei Dong, Jiaxuan Xiong +12 more
PatchWorld introduces a gradient-free framework to create executable Python world models from offline trajectories, achieving high planning scores by inducing symbolic belief-state programs.