Rui Li

25 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×21Crypto×9ML×8Vision×6NLP×6Robotics×2Multimedia×1Architecture×1

Frequent co-authors

Dongrui Liu8×

Xia Hu5×

Leitao Yuan4×

Jing Shao4×

Wenjie Wang4×

Wen Shen3×

Research Timeline

2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memory is the primary bottleneck.

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves belief state accuracy.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

UI-KOBE is a framework that enhances lightweight mobile GUI agents by integrating reusable, app-specific knowledge graphs, allowing them to perform complex tasks efficiently on-device without relying on large vision-language models.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.

A physics-informed foundation model for quantitative diffusion MRI

The paper introduces PIGMENT, a physics-informed foundation model that enables reliable quantitative mapping of brain microstructure from extremely sparse or challenging diffusion MRI scans.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vulnerability.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical privacy vulnerability.

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

The paper introduces SCOUT, a dynamic detector allocation framework that improves prompt-injection defense by predicting detector reliability and latency to optimize the trade-off between safety and operational utility.

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex benchmarks.

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating that structured memory is key to reliable LLM-based simulations.

Linear Complexity Fermionic Simulation on Quantum Devices with Hardware Connectivity Constraints

The paper introduces Accordion, an end-to-end framework that significantly improves the efficiency of compiling fermionic Hamiltonians into quantum circuits for simulation on constrained quantum hardware.

HLL: Can Agents Cross Humanity's Last Line of Verification?

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general capabilities and significantly reducing data requirements.

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video streams.

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIRecentJun 3, 2026

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

View →

cs.ROcs.AIcs.CVRecentJun 2, 2026