Hua Zhou

6 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×4Vision×2ML×2Multimedia×1Info Retrieval×1Stats ML×1

Frequent co-authors

Qian Kou3×

Research Timeline

2026

Harnessing non-adversarial robustness in large language models

The paper proposes a debiasing fine-tuning technique to efficiently enhance the robustness of Large Language Models against semantically similar but textually altered prompts.

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

The paper introduces MechVQA, a comprehensive dataset and benchmark for mechanical drawing understanding, and proposes the MechVL model, which significantly improves Multimodal LLMs' performance on these specialized tasks.

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

RAFT proposes a two-stage framework combining data refinement and adaptive distillation to improve domain-specific fine-tuning while mitigating the loss of general model capabilities.

Online Learning with Gradient-Variation Interval Regret

The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.

ChartWalker: Benchmarking the Cross-Chart RAG Task

The paper introduces ChartWalker, a framework for generating challenging cross-modal analytical tasks using charts, with a hierarchical knowledge graph construction method and structure-aware sampling algorithm.

Scalable Visual Pretraining for Language Intelligence

This paper presents the benefits of visual pretraining for foundation model intelligence, outperforming text-only pretraining on multiple backbones and benchmarks.

Highlighted terms show continued research focus across papers

Papers

cs.CVcs.AIcs.MMEmpiricalRecentJul 10, 2026

Scalable Visual Pretraining for Language Intelligence

Yiming Zhang, Zhonghan Zhao, Wenwei Zhang, Haiteng Zhao +12 more

This paper presents the benefits of visual pretraining for foundation model intelligence, outperforming text-only pretraining on multiple backbones and benchmarks.

View →

cs.IREmpirical