Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Hao Liu

Hao Liu

27 indexed papers

Recent (6 mo)
27
With code
0
Influential cites
0
Benchmarked
0

Publications per year

27
26

Top categories

AI×18Crypto×11NLP×5Vision×4Software Eng.×3ML×2Info Retrieval×1Sound×1

Frequent co-authors

Zhao Liu2×
Ke Chen2×
Chenghao Liu2×
Minghao Liu2×
Wanhao Liu2×
Jiaqing Xie2×

Research Timeline

2026
Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off

The paper theorizes that aligned LLMs remain jailbreakable due to 'Refusal-Escape Directions' (RED), which are continuous perturbation paths that shift model behavior from refusal to answering, and shows this vulnerability is linked to specific operator-level sources within the model architecture.

Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis

The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks, significantly reducing attack success rates.

Do Coding Agents Understand Least-Privilege Authorization?

The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization, and that this decomposition significantly improves both security and task success.

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exploring the resulting Semantic Policy Graph.

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

VFEAgent is a novel multi-agent framework that automates the entire Finite Element Analysis (FEA) workflow, achieving high success rates in generating complete and physically valid simulations directly from multimodal inputs.

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) have significant gaps in complex materials-science reasoning.

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

SkillsInjector proposes a two-stage adaptive method to dynamically optimize skill selection, quantity, and presentation for LLM agents, significantly improving task performance over static injection methods.

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling transformer.

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining utility.

TRACE: Discovering Task-Specific Parameter via Adaptation-Aware Probing for Continual Fine-Tuning

TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

Iteris: Agentic Research Loops for Computational Mathematics

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, requiring human expert validation.

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect detection and classification.

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL performance on external multimodal benchmarks.

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional methods.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget allocation problem.

Highlighted terms show continued research focus across papers

Papers

cs.AIEmpiricalRecentJun 9, 2026

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei +6 more

This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget al…

View →
cs.IRcs.AIcs.CLRecent
Jun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →
cs.LGcs.CLRecentJun 2, 2026

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…

View →
cs.AIcs.LGRecentJun 1, 2026

Iteris: Agentic Research Loops for Computational Mathematics

Leheng Chen, Zihao Liu, Wanyi He, Bin Dong

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, re…

View →
cs.CVRecentJun 1, 2026

Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation

Ni Li, Nuohao Liu, Ryan Jacobs, Ajay Annamareddy +4 more

The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect de…

View →
cs.SDcs.AIRecentJun 1, 2026

MOSS-Audio Technical Report

Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu +21 more

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

View →
cs.AIRecentJun 1, 2026

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

Tianze Yang, Yucheng Shi, Ruitong Sun, Jingyuan Huang +2 more

The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL pe…

View →
cs.CLRecentJun 1, 2026

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more

The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.

View →
cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →
cs.CLRecentMay 29, 2026

TRACE: Discovering Task-Specific Parameter via Adaptation-Aware Probing for Continual Fine-Tuning

Xiaosong Han, Ke Chen, Xindi Dai, Di Liang +6 more

TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.

View →
cs.AIRecentMay 28, 2026

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang +9 more

The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) h…

View →
cs.AIRecentMay 28, 2026

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

Yanchao Li, Wanhao Liu, Ben Gao, Jiaqing Xie +4 more

SkillsInjector proposes a two-stage adaptive method to dynamically optimize skill selection, quantity, and presentation for LLM agents, significantly improving task performance over static injection m…

View →
cs.CVcs.AIRecentMay 28, 2026

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…

View →
cs.AIRecentMay 28, 2026

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Zhihao Liu, Yifan Wu, Jian Lou, Di Wang +2 more

The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining u…

View →
cs.AIcs.CERecentMay 27, 2026

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

Jiachen Zhang, Junyi Lao, Chenghao Liu, Siyuan Liu +4 more

VFEAgent is a novel multi-agent framework that automates the entire Finite Element Analysis (FEA) workflow, achieving high success rates in generating complete and physically valid simulations directl…

View →
cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →
cs.AIcs.CRcs.SERecentMay 24, 2026

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

Xiaoyue Lu, Xianglin Yang, Haijun Liu, Jiahao Liu +3 more

The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exp…

View →
cs.CRcs.AIRecentMay 14, 2026

Do Coding Agents Understand Least-Privilege Authorization?

Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more

The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…

View →
cs.CRRecentMay 12, 2026

Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis

Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more

The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…

View →
cs.CRcs.AIRecentMay 9, 2026

Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off

Yu Chen, Yuanhao Liu, Qi Cao

The paper theorizes that aligned LLMs remain jailbreakable due to 'Refusal-Escape Directions' (RED), which are continuous perturbation paths that shift model behavior from refusal to answering, and sh…

View →