Peng Li
20 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper proposes a set of design principles and a conceptual benchmark (SOC-bench) to systematically evaluate the blue team operational capabilities of multi-agent AI systems in autonomous Security Operation Centers (SOCs).
MirageBackdoor introduces a novel, highly stealthy backdoor attack that forces Large Language Models to generate correct reasoning steps (Think Well) but output an incorrect final answer (Answer Wrong), bypassing existing detection methods.
The paper introduces IOCRegex-gen, an automated LLM-based system that converts Indicators of Compromise (IOCs) into syntactically and semantically correct regular expressions, achieving high accuracy in large-scale CTI processing.
The paper introduces a novel record-and-replay detection mechanism to accurately detect the true avalanche effect in ransomware, achieving high accuracy against real-world samples.
The paper proposes SATA, a semantics-aware traffic augmentation framework, to significantly improve the generalization of website fingerprinting models by addressing variability in resource composition and cross-layer feature instability.
The paper introduces FIDO, a novel framework that significantly boosts firmware fuzzing efficiency by accurately managing the timing and quantity of input delivery based on the firmware's internal input availability checks.
This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport the wrong objects.
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
The paper introduces PetroBench, a comprehensive benchmark for evaluating Large Language Models across various domains of petroleum engineering, finding that models perform better on subjective tasks than on objective factual knowledge.
The paper proposes a pose-conditioned, permutation-equivariant denoiser to accurately reconstruct work zone geometry using noisy Ultra-Wideband (UWB) range data from connected and autonomous vehicles (CAVs).
The paper introduces dynamic, per-request separator generation for Polymorphic Prompt Assembling (PPA), significantly reducing the blast-radius vulnerability to prompt injection attacks by ensuring unique separators for every request.
The paper proposes GRiD, a novel framework that uses a two-phase training strategy (supervised pre-training and RL fine-tuning) to discover complex, graph-like rules for knowledge graph reasoning, overcoming limitations of existing methods.
The paper introduces MAAD, a multi-agent framework that autonomously transforms software requirements into comprehensive, multi-view architectural blueprints, significantly improving completeness and reducing manual validation.
The paper proposes a novel, explicitly exploratory iterative Nash Learning from Human Feedback (NLHF) algorithm that achieves strong regret bounds for optimizing LLMs based on complex, non-scalar human preferences.
This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies for improving LLM reliability.
The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little consistent aggregate improvement.
TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconstructions.
The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target localization from modification execution.
This paper identifies two novel location inference attacks against k-nearest neighbor queries (kNNQ) and proposes DPRS, a differential privacy framework that effectively protects location privacy while maintaining high query utility.
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.