Hao Li

50 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×38NLP×14Crypto×12ML×8Vision×7Info Retrieval×3Multiagent×3Robotics×2

Frequent co-authors

Bo Zhang5×

Wenhao Li4×

Chao Shen4×

Han Li3×

Jing Yang3×

Chenhao Lin3×

Research Timeline

2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

Iteris: Agentic Research Loops for Computational Mathematics

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, requiring human expert validation.

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto large shared foundation models.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general capabilities and significantly reducing data requirements.

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significantly reducing the need for expensive real-world data collection.

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect detection and classification.

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more semantically rich geospatial foundation models.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an agent's reasoning fails.

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL performance on external multimodal benchmarks.

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

Formalizing the Binding Problem

This paper formalizes the binding problem using information theory and develops a probing method to measure binding information in deep learning representations, demonstrating that binding is crucial for strong visual recognition.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional methods.

Pepper: High-bandwidth and Scalable Anonymous Broadcast with Cryptographic Privacy

Pepper is a novel, high-bandwidth anonymous broadcast protocol that achieves cryptographic sender anonymity and significantly improves messaging throughput compared to existing state-of-the-art systems.

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget allocation problem.

Highlighted terms show continued research focus across papers

Papers

cs.AIEmpiricalRecentJun 9, 2026

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei +6 more

View →

cs.CLcs.AIcs.LGRecent