Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Hao Chen

Hao Chen

41 indexed papers

Recent (6 mo)
41
With code
0
Influential cites
0
Benchmarked
0

Publications per year

41
26

Top categories

AI×31Crypto×19NLP×9ML×8Vision×6Robotics×3Info Retrieval×1physics.app-ph×1

Frequent co-authors

Jiahao Chen6×
Muhao Chen6×
Shouling Ji5×
Xiaofei Wen5×
Tong Zhang4×
Hao Cheng3×

Research Timeline

2026
Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, executable programs to prevent numerical hallucinations.

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environments.

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

GaMi is a multimodal material identification system that uses mmWave and acoustic sensing with a cross-modal subtractive disentanglement framework to achieve high accuracy (95.2%) for material identification regardless of geometric variations.

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

The paper proposes GRiD, a novel framework that uses a two-phase training strategy (supervised pre-training and RL fine-tuning) to discover complex, graph-like rules for knowledge graph reasoning, overcoming limitations of existing methods.

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commonsense reasoning and explicit collision constraints.

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while significantly reducing computational requirements.

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations

MixFP4 introduces a mixed micro-format extension to NVFP4, allowing blocks to dynamically select between two stored FP4 formats (E2M1 and E1M2) to improve quantization accuracy without altering the standard hardware execution path.

Triaging Threats to Specialized Guardrails

The paper introduces RouteGuard, a router-expert framework, to improve the robustness and generalization of safety guardrails by specializing threat detection across multiple distinct unsafe categories.

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a reward signal derived from final answer correctness.

Demystifying the Optimal Fair Classifier in Multi-Class Classification

This paper addresses the challenge of achieving optimal fairness and accuracy simultaneously in multi-class classification by proposing novel in-processing and post-processing algorithms that converge to the optimal Pareto frontier.

Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems

The paper proposes an Entropy Dynamics framework to analyze the stability and failure modes of centralized orchestration in Multi-Agent Systems, identifying a 'Reasoning Trap' where complex reasoning models fail due to context overload.

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

The paper introduces HomeFlow, a verifiable data flywheel that procedurally generates high-quality, multi-turn training data for smart home agents, achieving state-of-the-art performance on smart home tasks.

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The paper proposes using Vision-Language Models (VLMs) as 'teachers' to guide Video Generation Models (VGMs) during test-time optimization, significantly improving video reasoning capabilities.

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe actions rather than just the final outcome.

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performance on challenging web benchmarks.

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-home scenarios.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling

This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.

Highlighted terms show continued research focus across papers

Papers

cs.RORecentJun 3, 2026

HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling

Chenhao Bai, Liqin Lu, Kaijun Wang, Hui Chen +4 more

This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.

View →
cs.CVRecentJun 1, 2026

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao +3 more

The paper proposes using Vision-Language Models (VLMs) as 'teachers' to guide Video Generation Models (VGMs) during test-time optimization, significantly improving video reasoning capabilities.

View →
cs.CVcs.AIRecentJun 1, 2026

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

Yiming Wang, Baiqi Wu, Qingming Li, Jiahao Chen +2 more

The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization…

View →
cs.CRcs.AIRecentJun 1, 2026

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more

SeClaw is a new framework that synthesizes security tasks from structured risk specifications to evaluate autonomous LLM agents' behavior in stateful environments, focusing on the process of unsafe ac…

View →
cs.LGcs.AIcs.CLRecentJun 1, 2026

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…

View →
cs.AIRecentJun 1, 2026

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu +11 more

The paper introduces SMH-Bench, a comprehensive benchmark built on a simulator to rigorously test LLM agents' ability to perform complex, environment-grounded reasoning and actions in realistic smart-…

View →
cs.CRcs.AIRecentJun 1, 2026

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu +20 more

SeClaw is a new framework that uses specification-driven task synthesis to create comprehensive and controllable security benchmarks for evaluating the unsafe behaviors of autonomous LLM agents.

View →
cs.AIRecentMay 31, 2026

Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems

Junze Zhu, Weihao Chen, Xuanwang Zhang, Zhen Wu +1 more

The paper proposes an Entropy Dynamics framework to analyze the stability and failure modes of centralized orchestration in Multi-Agent Systems, identifying a 'Reasoning Trap' where complex reasoning…

View →
cs.AIRecentMay 31, 2026

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

Yi Gu, Huacan Wang, Shuo Zhang, Yuqing Hou +9 more

The paper introduces HomeFlow, a verifiable data flywheel that procedurally generates high-quality, multi-turn training data for smart home agents, achieving state-of-the-art performance on smart home…

View →
cs.AIRecentMay 30, 2026

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…

View →
cs.LGcs.AIRecentMay 30, 2026

Demystifying the Optimal Fair Classifier in Multi-Class Classification

Li Zhang, Yuyuan Li, XiaoHua Feng, Jiaming Zhang +2 more

This paper addresses the challenge of achieving optimal fairness and accuracy simultaneously in multi-class classification by proposing novel in-processing and post-processing algorithms that converge…

View →
cs.IRcs.AIRecentMay 29, 2026

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.AIRecentMay 29, 2026

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Wenkai Shen, Pengyang Zhou, Jiahe Xu, Jiaming Qian +4 more

COMPASS introduces a Cognitive MCTS-Guided Process Alignment framework to ensure robust safety for LLM search agents by identifying and supervising risky intermediate steps in multi-step reasoning.

View →
cs.ETcs.AIcs.SDRecentMay 29, 2026

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

Zhiwei Chen, Yijie Li, Yimo Zhang, Shiyun Shao +8 more

GaMi is a multimodal material identification system that uses mmWave and acoustic sensing with a cross-modal subtractive disentanglement framework to achieve high accuracy (95.2%) for material identif…

View →
cs.AIRecentMay 29, 2026

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

Haoxiang Cheng, Yunfei Wang, Chao Chen, Kewei Cheng +4 more

The paper proposes GRiD, a novel framework that uses a two-phase training strategy (supervised pre-training and RL fine-tuning) to discover complex, graph-like rules for knowledge graph reasoning, ove…

View →
cs.ROcs.AIRecentMay 29, 2026

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…

View →
cs.CVcs.AIRecentMay 29, 2026

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

Zhiyuan Yang, Jiahao Cheng, Vincent Quoc-Huy Trinh, Mahdi S. Hosseini

The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while s…

View →
cs.ARRecentMay 29, 2026

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations

Jiaxiang Zou, Yonghao Chen, Ruilong Wu, Xinyu Chen

MixFP4 introduces a mixed micro-format extension to NVFP4, allowing blocks to dynamically select between two stored FP4 formats (E2M1 and E1M2) to improve quantization accuracy without altering the st…

View →
cs.CRcs.CLRecentMay 29, 2026

Triaging Threats to Specialized Guardrails

Wenjie Jacky Mo, Xiaofei Wen, Rui Cai, Boyu Zhu +5 more

The paper introduces RouteGuard, a router-expert framework, to improve the robustness and generalization of safety guardrails by specializing threat detection across multiple distinct unsafe categorie…

View →