Hao Chen

47 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×33Crypto×19ML×10NLP×10Vision×6Info Retrieval×3Robotics×3Software Eng.×2

Frequent co-authors

Jiahao Chen6×

Muhao Chen6×

Shouling Ji5×

Xiaofei Wen5×

Hao Cheng4×

Tong Zhang4×

Research Timeline

2026

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environments.

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a reward signal derived from final answer correctness.

Demystifying the Optimal Fair Classifier in Multi-Class Classification

This paper addresses the challenge of achieving optimal fairness and accuracy simultaneously in multi-class classification by proposing novel in-processing and post-processing algorithms that converge to the optimal Pareto frontier.

Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems

The paper proposes an Entropy Dynamics framework to analyze the stability and failure modes of centralized orchestration in Multi-Agent Systems, identifying a 'Reasoning Trap' where complex reasoning models fail due to context overload.

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

The paper introduces HomeFlow, a verifiable data flywheel that procedurally generates high-quality, multi-turn training data for smart home agents, achieving state-of-the-art performance on smart home tasks.

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The paper proposes using Vision-Language Models (VLMs) as 'teachers' to guide Video Generation Models (VGMs) during test-time optimization, significantly improving video reasoning capabilities.

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performance on challenging web benchmarks.

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-home scenarios.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling

This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.

Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index

This paper investigates the use of spectral filtering for continuous subgraph matching over dynamic graphs and presents three key findings.

LLM-Based Invariant Testing for Software Functional Bugs

LISA is a novel LLM-based invariant testing framework for software functional bugs, achieving higher bug-detection rates and competitive code coverage than fuzzing and prior LLM-based test generation approaches.

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRL is an open-source framework for training harness-based AI agents end-to-end in various environments using a lightweight proxy and Kubernetes orchestrator.

PinEqualizer: Full Funnel Content Exploration and Debiasing System at Pinterest

The authors propose a new solution for the content cold-start problem in industry-scale search and recommender systems, reducing bias, improving model prediction, and validating long-term impact.

Loom: Multi-Region Analysis of Spatial Transcriptomics with Local Neighborhoods and Global Trajectories

Loom is a system for analyzing spatial transcriptomics data through detailed pseudo-temporal exploration, cross-sample comparisons, and investigation of spatiotemporal biological mechanisms.

Sharpness-aware Model Merging with Salience Recovery for LLM-based Cross-Domain Sequential Recommendation

The paper proposes SharpRec, a framework for LLM-based Cross-Domain Sequential Recommendation to address the bottlenecks of cross-domain knowledge conflict and performance saturation in multi-domain fusion.

Highlighted terms show continued research focus across papers

Papers

cs.IRcs.LGNEWEmpiricalJul 28, 2026

Sharpness-aware Model Merging with Salience Recovery for LLM-based Cross-Domain Sequential Recommendation

Huwei Ji, Jiajie Su, Yuyuan Li, Xiaohua Feng +1 more

View →

cs.IRcs.LG