Huan Zhang
8 indexed papers
Research Timeline
The paper introduces ReCAP, a native GUI agent that significantly improves CAPTCHA solving success (from 30% to 80%) by integrating specialized CAPTCHA capabilities into a general-purpose, end-to-end vision-language model.
The paper proposes VerFU, a client-verifiable federated unlearning framework for low-altitude wireless networks that allows devices to ensure the server accurately removes their historical data contributions without revealing the original data.
The paper proposes PROVFUSION, a multi-view fusion framework that integrates anomaly signals from attribute, structure, and causality views to overcome the limitations of single node- or edge-centric provenance-based intrusion detection.
The paper proposes PRAETORIAN, a novel defense mechanism for Graph Neural Networks (GNNs) that targets the intrinsic structural requirements of backdoor attacks, significantly reducing the attack success rate while maintaining high clean accuracy.
The paper introduces CoT-Guard, a small, cost-effective 4B-parameter model that significantly outperforms large, expensive monitors like GPT-5 in detecting hidden objectives in code generation tasks.
The paper proposes GloResNet, a lightweight 3D CNN that effectively predicts brain injury in preterm infants using T2-weighted MRI, achieving an average accuracy of 75.18%.
The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performance on challenging web benchmarks.
The paper introduces PAR3D, a unified part-aware 3D-MLLM framework, to enhance 3D scene understanding by enabling models to reason about and ground both whole objects and their fine-grained parts.
Papers
PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding
Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang +1 more
The paper introduces PAR3D, a unified part-aware 3D-MLLM framework, to enhance 3D scene understanding by enabling models to reason about and ground both whole objects and their fine-grained parts.