Qi Hu
6 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
AutoMIA introduces an agentic framework that automates the process of Membership Inference Attacks (MIAs) by self-exploring the attack space, achieving state-of-the-art performance without manual feature engineering.
The paper introduces a semantic validation framework that uses unpackers as executable contracts to detect and repair semantic bugs in packer identification tools, significantly improving the reliability of malware analysis.
The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.
The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantly outperforms existing models on biomedical benchmarks.
MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over fixed-pipeline baselines.
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmful safety violations.
Papers
SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces
Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…