Zihan Wang

9 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×6NLP×5Crypto×4Vision×2ML×2Multiagent×1Robotics×1

Frequent co-authors

Minglai Yang3×

Manling Li2×

Wenjie Jacky Mo2×

Xiaofei Wen2×

Rui Cai2×

Boyu Zhu2×

Research Timeline

2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a significant and often overlooked copyright risk.

Jailbroken Frontier Models Retain Their Capabilities

The paper demonstrates that advanced jailbreaks do not impose a significant 'jailbreak tax' on highly capable frontier language models, retaining near-native performance.

Planning with the Views via Scene Self-Exploration

The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning performance over state-of-the-art models.

BAGEN: Are LLM Agents Budget-Aware?

This paper introduces the concept of Budget-Aware Agents (BAGEN), showing that current LLM agents often fail to manage resources proactively, and proposes that incorporating early stop and interval estimation significantly improves efficiency.

Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response

The paper models healthcare mechanism design as program synthesis, demonstrating that an optimized, mixed-objective program can eliminate up-coding and reduce patient rejection while maintaining financial viability.

Triaging Threats to Specialized Guardrails

The paper introduces RouteGuard, a router-expert framework, to improve the robustness and generalization of safety guardrails by specializing threat detection across multiple distinct unsafe categories.

Triaging Threats to Specialized Guardrails

The paper introduces RouteGuard, a router-expert framework, to improve the robustness and generalization of safety guardrails by specializing threat detection across multiple unsafe categories.

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

The paper introduces RefMem-Bench, a new benchmark for measuring reflective memory in long-horizon dialogue, and proposes REMIND, a framework that significantly improves models' ability to synthesize fragmented cues into high-level interpretations.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

View →

cs.CLcs.AIRecentMay 31, 2026