Bin Zhu
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces the concept of 'Sleeper Attack,' demonstrating that adversarial content can persist across multiple interactions with an LLM agent, posing a more subtle and difficult-to-detect safety threat than single-interaction attacks.
The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs often improve object-level behavior but their gains are architecture-dependent and costly.
The paper proposes a finite-calibration regime map to determine the optimal calibration method (low-dimensional stackers vs. joint tables) for LLM judge panels given limited human labeling budgets, showing that the need for complex interactions dictates the best approach.
The paper proposes the Shortcut Subspace Suppression (S^3) framework to improve deepfake detection generalization by explicitly identifying and suppressing method-specific shortcuts in learned feature representations.
The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current models fail in complex reasoning and safety checks.
Papers
Suppressing Forgery-Specific Shortcuts for Generalizable Deepfake Detection
Yihui Wang, Yonghui Yang, Jilong Liu, Fengbin Zhu +2 more
The paper proposes the Shortcut Subspace Suppression (S^3) framework to improve deepfake detection generalization by explicitly identifying and suppressing method-specific shortcuts in learned feature…