Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Zhi Li

Zhi Li

12 indexed papers

Recent (6 mo)
12
With code
0
Influential cites
0
Benchmarked
0

Publications per year

12
26

Top categories

AI×8Crypto×8NLP×3Vision×3Audio and Speech Processing×1Prog. Lang.×1

Frequent co-authors

Hongbo Wen4×
Yanju Chen4×
Hanzhi Liu4×
Yu Feng4×
Chaofan Shou3×
Ying Li2×

Research Timeline

2026
When Convenience Becomes Risk: A Semantic View of Under-Specification in Host-Acting Agents

The paper identifies that the convenience of host-acting agents leads to semantic under-specification in user goals, which forces the agent to generate potentially risky execution plans.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

This paper systematically analyzes the threat posed by malicious third-party API routers in the LLM supply chain, finding that a significant number of routers actively perform payload injection, credential theft, and cryptocurrency draining.

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vulnerabilities.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Semia is a novel static auditor that translates complex, prose-defined agent skills into a verifiable Datalog fact base, enabling the detection of critical security vulnerabilities in real-world LLM agents.

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

The paper introduces Checkerboard, a novel, learning-free clean-label backdoor attack that efficiently poisons training data to compromise model integrity with minimal poisoning budget.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

The paper proposes an operation-centric, TEE-backed isolation model to constrain self-hosted computer-use agents, preventing malicious or unsafe host-level operations without sacrificing general functionality.

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

The paper introduces Sefz, a semantic fuzzing framework that automatically discovers specification violations in LLM agent skills, finding a significant number of previously unknown exploitable guardrail breaches.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond simple geometric matching.

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform moderately, there is significant room for improvement in handling complex financial multimodal tasks.

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages and that direct audio processing is superior to cascaded ASR+LLM systems.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →
cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →
cs.CVcs.AIRecentMay 28, 2026

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…

View →
cs.AIRecentMay 27, 2026

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

Xiaoyu Dong, Zhi Li, Xiao-Ming Wu

The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond…

View →
cs.AIcs.CLcs.CRRecentMay 17, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

View →
cs.CRcs.AIRecentMay 13, 2026

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

Ying Li, Hongbo Wen, Yanju Chen, Hanzhi Liu +2 more

The paper introduces Sefz, a semantic fuzzing framework that automatically discovers specification violations in LLM agent skills, finding a significant number of previously unknown exploitable guardr…

View →
cs.CRRecentMay 7, 2026

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Di Lu, Bo Zhang, Xiyuan Li, Yongzhi Liao +4 more

The paper proposes an operation-centric, TEE-backed isolation model to constrain self-hosted computer-use agents, preventing malicious or unsafe host-level operations without sacrificing general funct…

View →
cs.CRcs.CVRecentMay 2, 2026

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

Yi Yang, Jinyang Huang, Binbin Liu, Feng-Qi Cui +4 more

The paper introduces Checkerboard, a novel, learning-free clean-label backdoor attack that efficiently poisons training data to compromise model integrity with minimal poisoning budget.

View →
cs.CRcs.AIcs.PLRecentMay 1, 2026

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Hongbo Wen, Ying Li, Hanzhi Liu, Chaofan Shou +3 more

Semia is a novel static auditor that translates complex, prose-defined agent skills into a verifiable Datalog fact base, enabling the detection of critical security vulnerabilities in real-world LLM a…

View →
cs.CRRecentApr 22, 2026

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen +3 more

The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vu…

View →
cs.CRRecentApr 9, 2026

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen +2 more

This paper systematically analyzes the threat posed by malicious third-party API routers in the LLM supply chain, finding that a significant number of routers actively perform payload injection, crede…

View →
cs.CRcs.AIRecentMar 22, 2026

When Convenience Becomes Risk: A Semantic View of Under-Specification in Host-Acting Agents

Di Lu, Yongzhi Liao, Xutong Mu, Lele Zheng +4 more

The paper identifies that the convenience of host-acting agents leads to semantic under-specification in user goals, which forces the agent to generate potentially risky execution plans.

View →