Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Peng Li

Peng Li

20 indexed papers

Recent (6 mo)
20
With code
0
Influential cites
0
Benchmarked
0

Publications per year

20
26

Top categories

AI×11Crypto×9Vision×2Software Eng.×2ML×2Robotics×2Databases×1Distributed×1

Frequent co-authors

Peng Liu7×
Xiang Wang2×
Xiaopeng Li2×
Zhiyi Chen1×
Jie Song1×
Zhiyu Sun1×

Research Timeline

2026
Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

This paper proposes a set of design principles and a conceptual benchmark (SOC-bench) to systematically evaluate the blue team operational capabilities of multi-agent AI systems in autonomous Security Operation Centers (SOCs).

MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning

MirageBackdoor introduces a novel, highly stealthy backdoor attack that forces Large Language Models to generate correct reasoning steps (Think Well) but output an incorrect final answer (Answer Wrong), bypassing existing detection methods.

From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

The paper introduces IOCRegex-gen, an automated LLM-based system that converts Indicators of Compromise (IOCs) into syntactically and semantically correct regular expressions, achieving high accuracy in large-scale CTI processing.

Detecting Avalanche Effect in Adversarial Settings: Spotting the Encryption Loops in Ransomware

The paper introduces a novel record-and-replay detection mechanism to accurately detect the true avalanche effect in ransomware, achieving high accuracy against real-world samples.

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

The paper proposes SATA, a semantics-aware traffic augmentation framework, to significantly improve the generalization of website fingerprinting models by addressing variability in resource composition and cross-layer feature instability.

Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input Delivery

The paper introduces FIDO, a novel framework that significantly boosts firmware fuzzing efficiency by accurately managing the timing and quantity of input delivery based on the firmware's internal input availability checks.

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport the wrong objects.

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.

V2I Work Zone Geometry Reconstruction with Pose-Conditioned UWB Range Denoising

The paper proposes a pose-conditioned, permutation-equivariant denoiser to accurately reconstruct work zone geometry using noisy Ultra-Wideband (UWB) range data from connected and autonomous vehicles (CAVs).

Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks

The paper introduces dynamic, per-request separator generation for Polymorphic Prompt Assembling (PPA), significantly reducing the blast-radius vulnerability to prompt injection attacks by ensuring unique separators for every request.

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

The paper proposes GRiD, a novel framework that uses a two-phase training strategy (supervised pre-training and RL fine-tuning) to discover complex, graph-like rules for knowledge graph reasoning, overcoming limitations of existing methods.

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

The paper introduces MAAD, a multi-agent framework that autonomously transforms software requirements into comprehensive, multi-view architectural blueprints, significantly improving completeness and reducing manual validation.

Efficient Exploration for Iterative Nash Preference Optimization

The paper proposes a novel, explicitly exploratory iterative Nash Learning from Human Feedback (NLHF) algorithm that achieves strong regret bounds for optimizing LLMs based on complex, non-scalar human preferences.

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies for improving LLM reliability.

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little consistent aggregate improvement.

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconstructions.

What to Format and How: A Benchmark and Workflow Approach for Document Formatting

The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target localization from modification execution.

Protecting K-Nearest Neighbor Queries from Location Inference Attacks

This paper identifies two novel location inference attacks against k-nearest neighbor queries (kNNQ) and proposes DPRS, a differential privacy framework that effectively protects location privacy while maintaining high query utility.

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.

Highlighted terms show continued research focus across papers

Papers

cs.DBcs.AIEmpiricalRecentJun 10, 2026

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

Zhiyi Chen, Jie Song, Peng Li

The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.

View →
cs.CRRecentJun 4, 2026

Protecting K-Nearest Neighbor Queries from Location Inference Attacks

Zhiyu Sun, Jie Fu, Xinpeng Ling, Huifa Li +1 more

This paper identifies two novel location inference attacks against k-nearest neighbor queries (kNNQ) and proposes DPRS, a differential privacy framework that effectively protects location privacy whil…

View →
cs.DCcs.AIRecentJun 1, 2026

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

Yafan Huang, Sheng Di, Guanpeng Li

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…

View →
cs.CVcs.AIRecentJun 1, 2026

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang +5 more

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little co…

View →
cs.CVRecentJun 1, 2026

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

Jinpeng Liu, Yukang Xu, Yutong Li, Xingyu Liu

TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…

View →
cs.CLRecentJun 1, 2026

What to Format and How: A Benchmark and Workflow Approach for Document Formatting

Shihao Rao, Liang Li, Jiapeng Liu, Tong Lin +5 more

The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target loc…

View →
cs.SEcs.AIRecentMay 31, 2026

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

Ruiyin Li, Yiran Zhang, Xiyu Zhou, Yangxiao Cai +5 more

The paper introduces MAAD, a multi-agent framework that autonomously transforms software requirements into comprehensive, multi-view architectural blueprints, significantly improving completeness and…

View →
cs.LGcs.AIRecentMay 31, 2026

Efficient Exploration for Iterative Nash Preference Optimization

Tianlong Nan, Xiaopeng Li, Christian Kroer, Tianyi Lin

The paper proposes a novel, explicitly exploratory iterative Nash Learning from Human Feedback (NLHF) algorithm that achieves strong regret bounds for optimizing LLMs based on complex, non-scalar huma…

View →
cs.AIRecentMay 29, 2026

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

Haoxiang Cheng, Yunfei Wang, Chao Chen, Kewei Cheng +4 more

The paper proposes GRiD, a novel framework that uses a two-phase training strategy (supervised pre-training and RL fine-tuning) to discover complex, graph-like rules for knowledge graph reasoning, ove…

View →
cs.ROcs.AIRecentMay 28, 2026

V2I Work Zone Geometry Reconstruction with Pose-Conditioned UWB Range Denoising

Jiaxi Liu, Hangyu Li, Yang Cheng, Rui Gana +6 more

The paper proposes a pose-conditioned, permutation-equivariant denoiser to accurately reconstruct work zone geometry using noisy Ultra-Wideband (UWB) range data from connected and autonomous vehicles…

View →
cs.CRRecentMay 28, 2026

Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks

Nima Dorzhiev, Peng Liu

The paper introduces dynamic, per-request separator generation for Polymorphic Prompt Assembling (PPA), significantly reducing the blast-radius vulnerability to prompt injection attacks by ensuring un…

View →
cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →
cs.AIRecentMay 27, 2026

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

Xiang Wang, Tingting Zhang, Sen Wang, Ying Wu +3 more

The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks…

View →
cs.CRcs.AIcs.RORecentMay 18, 2026

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh, Peng Liu

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport…

View →
cs.CRcs.SERecentMay 16, 2026

Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input Delivery

Shandian Shen, Wei Zhou, Keming Zhao, Peng Liu +2 more

The paper introduces FIDO, a novel framework that significantly boosts firmware fuzzing efficiency by accurately managing the timing and quantity of input delivery based on the firmware's internal inp…

View →
cs.LGcs.CRcs.NIRecentMay 12, 2026

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

Youquan Xian, Xueying Zeng, Lingjia Meng, Lei Cui +5 more

The paper proposes SATA, a semantics-aware traffic augmentation framework, to significantly improve the generalization of website fingerprinting models by addressing variability in resource compositio…

View →
cs.CRRecentApr 27, 2026

Detecting Avalanche Effect in Adversarial Settings: Spotting the Encryption Loops in Ransomware

Nanqing Luo, Xusheng Li, Haizhou Wang, Shuangyi Zhu +2 more

The paper introduces a novel record-and-replay detection mechanism to accurately detect the true avalanche effect in ransomware, achieving high accuracy against real-world samples.

View →
cs.CRRecentApr 14, 2026

From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh, Xiaoyan Sun +2 more

The paper introduces IOCRegex-gen, an automated LLM-based system that converts Indicators of Compromise (IOCs) into syntactically and semantically correct regular expressions, achieving high accuracy…

View →
cs.CRRecentApr 8, 2026

MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning

Yizhe Zeng, Wei Zhang, Yunpeng Li, Juxin Xiao +2 more

MirageBackdoor introduces a novel, highly stealthy backdoor attack that forces Large Language Models to generate correct reasoning steps (Think Well) but output an incorrect final answer (Answer Wrong…

View →
cs.CRcs.AIRecentMar 30, 2026

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

Yicheng Cai, Mitchell John DeStefano, Guodong Dong, Pulkit Handa +4 more

This paper proposes a set of design principles and a conceptual benchmark (SOC-bench) to systematically evaluate the blue team operational capabilities of multi-agent AI systems in autonomous Security…

View →