Rui Liu

14 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×13Crypto×6ML×5NLP×4Vision×4Multimedia×1Info Retrieval×1Stats ML×1

Frequent co-authors

Dongrui Liu8×

Xia Hu5×

Leitao Yuan4×

Jing Shao4×

Wenjie Wang4×

Wen Shen3×

Research Timeline

2026

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

The paper proposes FRA-Attack, a frequency-domain regularization method, to significantly improve the transferability of adversarial attacks against closed-source Multimodal Large Language Models (MLLMs).

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memory is the primary bottleneck.

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

This paper introduces the concept of 'Sleeper Attack,' demonstrating that adversarial content can persist across multiple interactions with an LLM agent, posing a more subtle and difficult-to-detect safety threat than single-interaction attacks.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

UI-KOBE is a framework that enhances lightweight mobile GUI agents by integrating reusable, app-specific knowledge graphs, allowing them to perform complex tasks efficiently on-device without relying on large vision-language models.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vulnerability.

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical privacy vulnerability.

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating that structured memory is key to reliable LLM-based simulations.

HLL: Can Agents Cross Humanity's Last Line of Verification?

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video streams.

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIRecentJun 3, 2026

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more

This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.

View →

cs.AIcs.CLcs.CVRecentJun 1, 2026