Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Qi Hu

Qi Hu

6 indexed papers

Recent (6 mo)
6
With code
0
Influential cites
0
Benchmarked
0

Publications per year

6
26

Top categories

Crypto×3AI×2NLP×2Software Eng.×1Vision×1

Frequent co-authors

Yifeng Tang1×
Qinghua Wang1×
Lanyang Zhao1×
Pengji Zhang1×
Yuhao Qing1×
Xin Yao1×

Research Timeline

2026
AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

AutoMIA introduces an agentic framework that automates the process of Membership Inference Attacks (MIAs) by self-exploring the attack space, achieving state-of-the-art performance without manual feature engineering.

Semantic Validation of Packer Identification Tools: Characterization, Repair, and Downstream Impact

The paper introduces a semantic validation framework that uses unpackers as executable contracts to detect and repair semantic bugs in packer identification tools, significantly improving the reliability of malware analysis.

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantly outperforms existing models on biomedical benchmarks.

MemPro: Agentic Memory Systems as Evolvable Programs

MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over fixed-pipeline baselines.

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmful safety violations.

Highlighted terms show continued research focus across papers

Papers

cs.SEcs.CRRecentMay 31, 2026

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more

The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…

View →
cs.AIRecentMay 30, 2026

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more

The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…

View →
cs.CLcs.AIRecentMay 30, 2026

MemPro: Agentic Memory Systems as Evolvable Programs

Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang +4 more

MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over…

View →
cs.CLRecentMay 29, 2026

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…

View →
cs.CRRecentMay 25, 2026

Semantic Validation of Packer Identification Tools: Characterization, Repair, and Downstream Impact

Fangtian Zhong, Zhuoyun Qian, Mengfei Ren, Yili Jiang +3 more

The paper introduces a semantic validation framework that uses unpackers as executable contracts to detect and repair semantic bugs in packer identification tools, significantly improving the reliabil…

View →
cs.CRcs.CVRecentApr 1, 2026

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li, Xinchao Wang

AutoMIA introduces an agentic framework that automates the process of Membership Inference Attacks (MIAs) by self-exploring the attack space, achieving state-of-the-art performance without manual feat…

View →