Wei Xi
15 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper identifies that background 'heartbeat' execution in personal AI agents like Claw can silently pollute the agent's memory with external misinformation, influencing user behavior without the user's knowledge or explicit prompt injection.
This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust, and reliable real-world agents.
Sneakdoor introduces a novel backdoor attack method that enhances stealthiness in dataset condensation by using a generative module to create input-aware triggers, achieving high attack efficacy while remaining imperceptible.
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
The paper introduces FoodGuardBench, a comprehensive benchmark and a specialized guardrail model (FoodGuard-4B) to rigorously test and mitigate the severe food safety risks posed by large language models.
FedIDM introduces a novel federated learning framework that uses iterative distribution matching to achieve fast and stable convergence and maintain high model utility even when facing a large proportion of Byzantine malicious clients.
TwinGate introduces a stateful dual-encoder defense framework using Asymmetric Contrastive Learning to detect malicious intent from fragmented, untraceable LLM queries with high recall and low false positives.
This paper introduces personalized mechanisms for estimating streaming statistics under $w$-event personalized differential privacy, significantly improving accuracy compared to existing methods.
The paper introduces Latent Policy Guardrail (LPG), a novel framework that efficiently enforces dynamic safety policies for LLMs by compressing complex policy deliberation into a small set of latent tokens, achieving high accuracy with significantly reduced latency.
This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and beneficial learning signals for LLMs.
The paper introduces an operator-level factorial benchmark for molecular MPNNs, finding that message construction (specifically concatenation-based mixing) is the primary determinant of performance, rather than the complexity of the node update mechanism.
The paper introduces Metacognitive Memory Policy Optimization (MMPO), a novel memory training approach that optimizes LLM memory not based on final task success, but on minimizing epistemic uncertainty in intermediate summaries, significantly improving long-horizon agent performance.
PatchWorld introduces a gradient-free framework to create executable Python world models from offline trajectories, achieving high planning scores by inducing symbolic belief-state programs.
SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, significantly boosting agent performance.
MaskForge is a novel, adaptive, black-box attack framework that significantly improves jailbreaking diffusion large language models (dLLMs) by treating red-teaming as an optimized search over reusable structural patterns.
Papers
MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models
Yingzi Ma, Zhengyue Zhao, Xiaogeng Liu, Minhui Xue +2 more
MaskForge is a novel, adaptive, black-box attack framework that significantly improves jailbreaking diffusion large language models (dLLMs) by treating red-teaming as an optimized search over reusable…