Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Hao Li

Hao Li

50 indexed papers

Recent (6 mo)
50
With code
0
Influential cites
0
Benchmarked
0

Publications per year

50
26

Top categories

AI×38NLP×14Crypto×12ML×8Vision×7Info Retrieval×3Multiagent×3Robotics×2

Frequent co-authors

Bo Zhang5×
Wenhao Li4×
Chao Shen4×
Han Li3×
Jing Yang3×
Chenhao Lin3×

Research Timeline

2026
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

Iteris: Agentic Research Loops for Computational Mathematics

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, requiring human expert validation.

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto large shared foundation models.

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general capabilities and significantly reducing data requirements.

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significantly reducing the need for expensive real-world data collection.

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect detection and classification.

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more semantically rich geospatial foundation models.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an agent's reasoning fails.

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL performance on external multimodal benchmarks.

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

Formalizing the Binding Problem

This paper formalizes the binding problem using information theory and develops a probing method to measure binding information in deep learning representations, demonstrating that binding is crucial for strong visual recognition.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional methods.

Pepper: High-bandwidth and Scalable Anonymous Broadcast with Cryptographic Privacy

Pepper is a novel, high-bandwidth anonymous broadcast protocol that achieves cryptographic sender anonymity and significantly improves messaging throughput compared to existing state-of-the-art systems.

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget allocation problem.

Highlighted terms show continued research focus across papers

Papers

cs.AIEmpiricalRecentJun 9, 2026

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei +6 more

This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget al…

View →
cs.CLcs.AIcs.LGRecent
Jun 4, 2026

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao +8 more

The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.

View →
cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →
cs.CRRecentJun 3, 2026

Pepper: High-bandwidth and Scalable Anonymous Broadcast with Cryptographic Privacy

Chenghao Li, Haoyuan Wang, Xianghang Mi

Pepper is a novel, high-bandwidth anonymous broadcast protocol that achieves cryptographic sender anonymity and significantly improves messaging throughput compared to existing state-of-the-art system…

View →
cs.CVcs.AIcs.LGRecentJun 2, 2026

Formalizing the Binding Problem

Lianghuan Huang, Yihao Li, Saeed Salehi, Yingshan Chang +2 more

This paper formalizes the binding problem using information theory and develops a probing method to measure binding information in deep learning representations, demonstrating that binding is crucial…

View →
cs.LGcs.CLRecentJun 2, 2026

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…

View →
cs.AIcs.LGRecentJun 1, 2026

Iteris: Agentic Research Loops for Computational Mathematics

Leheng Chen, Zihao Liu, Wanyi He, Bin Dong

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, re…

View →
cs.LGcs.CLRecentJun 1, 2026

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Mind Lab, :, Song Cao, Vic Cao +51 more

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto l…

View →
cs.AIcs.CLRecentJun 1, 2026

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

Hao Li, Jingkun An, Zijun Song, Pengyu Zhu +7 more

SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general…

View →
cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →
cs.CVRecentJun 1, 2026

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

Ni Li, Nuohao Liu, Ryan Jacobs, Ajay Annamareddy +4 more

The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect de…

View →
cs.AIRecentJun 1, 2026

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer +2 more

The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more sem…

View →
cs.CRcs.AIRecentJun 1, 2026

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe ac…

View →
cs.AIRecentJun 1, 2026

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li +7 more

The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an ag…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more

The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.

View →
cs.SDcs.AIRecentJun 1, 2026

MOSS-Audio Technical Report

Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu +21 more

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

View →
cs.AIRecentJun 1, 2026

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

Tianze Yang, Yucheng Shi, Ruitong Sun, Jingyuan Huang +2 more

The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL pe…

View →
cs.CLRecentJun 1, 2026

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

View →
cs.CRcs.AIRecentJun 1, 2026

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

View →
cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →