Zhi Li

12 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×8Crypto×8NLP×3Vision×3Audio and Speech Processing×1Prog. Lang.×1

Frequent co-authors

Hongbo Wen4×

Yanju Chen4×

Hanzhi Liu4×

Yu Feng4×

Chaofan Shou3×

Ying Li2×

Research Timeline

2026

When Convenience Becomes Risk: A Semantic View of Under-Specification in Host-Acting Agents

The paper identifies that the convenience of host-acting agents leads to semantic under-specification in user goals, which forces the agent to generate potentially risky execution plans.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

This paper systematically analyzes the threat posed by malicious third-party API routers in the LLM supply chain, finding that a significant number of routers actively perform payload injection, credential theft, and cryptocurrency draining.

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vulnerabilities.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Semia is a novel static auditor that translates complex, prose-defined agent skills into a verifiable Datalog fact base, enabling the detection of critical security vulnerabilities in real-world LLM agents.

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

The paper introduces Checkerboard, a novel, learning-free clean-label backdoor attack that efficiently poisons training data to compromise model integrity with minimal poisoning budget.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

The paper proposes an operation-centric, TEE-backed isolation model to constrain self-hosted computer-use agents, preventing malicious or unsafe host-level operations without sacrificing general functionality.

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

The paper introduces Sefz, a semantic fuzzing framework that automatically discovers specification violations in LLM agent skills, finding a significant number of previously unknown exploitable guardrail breaches.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond simple geometric matching.

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform moderately, there is significant room for improvement in handling complex financial multimodal tasks.

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages and that direct audio processing is superior to cascaded ASR+LLM systems.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

View →

cs.CLcs.AIeess.ASRecentMay 31, 2026