Rui Li
25 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memory is the primary bottleneck.
The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves belief state accuracy.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.
UI-KOBE is a framework that enhances lightweight mobile GUI agents by integrating reusable, app-specific knowledge graphs, allowing them to perform complex tasks efficiently on-device without relying on large vision-language models.
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.
COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.
The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.
The paper introduces PIGMENT, a physics-informed foundation model that enables reliable quantitative mapping of brain microstructure from extremely sparse or challenging diffusion MRI scans.
The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to show that unnecessary and sensitive data acquisition is a widespread and critical privacy vulnerability.
The paper introduces PrivacyPeek, a new benchmark that audits the acquisition stage of LLM-based agents to demonstrate that unnecessary acquisition of sensitive data is a widespread and critical privacy vulnerability.
The paper introduces SCOUT, a dynamic detector allocation framework that improves prompt-injection defense by predicting detector reliability and latency to optimize the trade-off between safety and operational utility.
PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.
SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex benchmarks.
The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating that structured memory is key to reliable LLM-based simulations.
The paper introduces Accordion, an end-to-end framework that significantly improves the efficiency of compiling fermionic Hamiltonians into quantum circuits for simulation on constrained quantum hardware.
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.
SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general capabilities and significantly reducing data requirements.
The paper introduces X-Stream, a new benchmark for multi-stream video understanding, and finds that current state-of-the-art MLLMs perform poorly when required to process multiple concurrent video streams.
The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.
Papers
From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents
Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.