Ao Xu
25 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fairness in content provenance.
This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated research.
ClawGuard introduces a passive, out-of-band security monitor that detects LLM agent workflow hijacking by analyzing unique electromagnetic (EM) emanations generated during agent skill execution.
The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks, significantly reducing attack success rates.
The paper introduces QuartetFuzz, an autonomous system that systematically ensures the correctness of fuzzing harnesses using a novel Four Principles framework, significantly improving vulnerability discovery and identifying existing harness flaws.
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function dependencies.
The paper proposes ProRL, an effective Reinforcement Learning framework that rectifies gradient estimation deficiencies to optimize proactive recommendation paths, significantly outperforming existing state-of-the-art methods.
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of whether the evidence is presented in a single prompt or gradually across multiple turns.
The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ensuring reliable safety across different risk domains.
The paper introduces StreamSynth, a sequential setting for synthetic data generation, and proposes SynLearner, a framework that enables LLMs to improve synthesis performance by accumulating and transferring experience across a stream of tasks.
The paper introduces EvoXXLTraffic, an ultra-large, sensor-evolving dataset that simulates real-world road network growth, demonstrating that existing state-of-the-art traffic forecasting models fail when faced with such dynamic network changes.
The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art performance on complex benchmarks.
The paper introduces TouchSafeBench, a physics-grounded benchmark, to evaluate collision grounding—the ability to predict robot-human collisions—and finds that current Vision-Language Models (VLMs) are unreliable for safe human-robot collaboration.
The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated into the final safety decision.
The paper proposes MADS, a Model-Aware Diverse Core Set Selection method that uses LLM internal activation states to select a small, diverse core set of instructions, significantly improving model performance while reducing data requirements.
EnergyMamba proposes an uncertainty-aware, graph-enhanced selective state space model to significantly improve both the accuracy and reliability of energy consumption prediction by explicitly modeling spatial dependencies.
The paper proposes Task-Aware Coactivation Grouping (TACG) to significantly reduce communication costs in multi-task MoE inference by grouping experts based on task-specific co-activation patterns, outperforming task-agnostic methods.
The paper proposes a Doeblin-anchored contrastive chart to learn valid Markov transition kernels by combining the target transition with a restart law, ensuring the learned object is mathematically sound for dynamical systems.
THRD introduces a novel, training-free framework that models temporal risk accumulation to effectively defend against multi-turn jailbreak attacks on LLMs, significantly reducing attack success rates while maintaining model utility.
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
Papers
A Doeblin-Anchored Contrastive Chart for Learning Markov Transition Kernels
The paper proposes a Doeblin-anchored contrastive chart to learn valid Markov transition kernels by combining the target transition with a restart law, ensuring the learned object is mathematically so…