Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Rui Li

Rui Li

25 indexed papers

Recent (6 mo)
25
With code
0
Influential cites
0
Benchmarked
0

Publications per year

25
26

Top categories

AI×21Crypto×9ML×8Vision×6NLP×6Robotics×2Multimedia×1Architecture×1

Frequent co-authors

Dongrui Liu8×
Xia Hu5×
Leitao Yuan4×
Jing Shao4×
Wenjie Wang4×
Wen Shen3×

Research Timeline

2026
Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memory is the primary bottleneck.

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves belief state accuracy.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

UI-KOBE is a framework that enhances lightweight mobile GUI agents by integrating reusable, app-specific knowledge graphs, allowing them to perform complex tasks efficiently on-device without relying on large vision-language models.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.

A physics-informed foundation model for quantitative diffusion MRI

The paper introduces PIGMENT, a physics-informed foundation model that enables reliable quantitative mapping of brain microstructure from extremely sparse or challenging diffusion MRI scans.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vulnerability.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical privacy vulnerability.

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

The paper introduces SCOUT, a dynamic detector allocation framework that improves prompt-injection defense by predicting detector reliability and latency to optimize the trade-off between safety and operational utility.

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex benchmarks.

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating that structured memory is key to reliable LLM-based simulations.

Linear Complexity Fermionic Simulation on Quantum Devices with Hardware Connectivity Constraints

The paper introduces Accordion, an end-to-end framework that significantly improves the efficiency of compiling fermionic Hamiltonians into quantum circuits for simulation on constrained quantum hardware.

HLL: Can Agents Cross Humanity's Last Line of Verification?

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general capabilities and significantly reducing data requirements.

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video streams.

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIRecentJun 3, 2026

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

View →
cs.ROcs.AIcs.CVRecentJun 2, 2026

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin +9 more

The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.

View →
cs.AIcs.CLcs.CVRecentJun 1, 2026

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…

View →
cs.AIcs.CLRecentJun 1, 2026

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

Hao Li, Jingkun An, Zijun Song, Pengyu Zhu +7 more

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general…

View →
cs.CVRecentJun 1, 2026

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Peiwen Sun, Xudong Lu, Huadai Liu, Yang Bo +8 more

The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video str…

View →
cs.AIRecentMay 31, 2026

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma +7 more

SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex be…

View →
cs.AIRecentMay 31, 2026

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou +3 more

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating t…

View →
cs.ARRecentMay 31, 2026

Linear Complexity Fermionic Simulation on Quantum Devices with Hardware Connectivity Constraints

Xiangyu Gao, Winston Li, Jiakang Li, Zirui Li +3 more

The paper introduces Accordion, an end-to-end framework that significantly improves the efficiency of compiling fermionic Hamiltonians into quantum circuits for simulation on constrained quantum hardw…

View →
cs.ROcs.AIeess.SYRecentMay 30, 2026

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Haofan Cao, Zhaoyang Li, Zhichao You, Liang Guo +1 more

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

View →
cs.AIcs.CLcs.LGRecentMay 29, 2026

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

View →
cs.IRcs.AIRecentMay 29, 2026

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…

View →
eess.IVcs.AIRecentMay 29, 2026

A physics-informed foundation model for quantitative diffusion MRI

Zihan Li, Jialan Zheng, Ziyu Li, Xun Yuan +17 more

The paper introduces PIGMENT, a physics-informed foundation model that enables reliable quantitative mapping of brain microstructure from extremely sparse or challenging diffusion MRI scans.

View →
cs.CRcs.AIRecentMay 29, 2026

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li +4 more

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vul…

View →
cs.CRcs.AIRecentMay 29, 2026

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li +4 more

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical priva…

View →
cs.CRcs.LGRecentMay 29, 2026

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Shuhao Zhang, Jiarui Li, Qi Cao, Ruiyi Zhang +1 more

The paper introduces SCOUT, a dynamic detector allocation framework that improves prompt-injection defense by predicting detector reliability and latency to optimize the trade-off between safety and o…

View →
cs.AIcs.CLcs.LGRecentMay 28, 2026

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang +5 more

The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves be…

View →
cs.AIcs.CLcs.CRRecentMay 28, 2026

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang +46 more

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

View →
cs.AIRecentMay 28, 2026

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

Yuxiang Chai, Han Xiao, Xinyu Fu, Jinpeng Chen +2 more

UI-KOBE is a framework that enhances lightweight mobile GUI agents by integrating reusable, app-specific knowledge graphs, allowing them to perform complex tasks efficiently on-device without relying…

View →
cs.AIcs.CLcs.CRRecentMay 28, 2026

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang +46 more

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

View →
cs.AIRecentMay 27, 2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…

View →