Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Qi Gu

Qi Gu

9 indexed papers

Recent (6 mo)
9
With code
0
Influential cites
0
Benchmarked
0

Publications per year

9
26

Top categories

AI×6NLP×4ML×3Crypto×3Stats ML×1Stats Theory×1Stats Comp.×1Vision×1

Frequent co-authors

Jiaqi Guo3×
Ruoqi Guo2×
Yi Liu2×
Gelei Deng2×
Yiheng Xiong2×
Yuekang Li2×

Research Timeline

2026
LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

The paper introduces LinuxArena, a large-scale, diverse control setting for testing AI agents in live production environments, demonstrating its utility for evaluating both attack and defense mechanisms.

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current GUI agents.

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current VLM-driven GUI agents.

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multimodal time series prediction.

A Unified and Reproducible Experimentation Framework for Speech Understanding

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a significant challenge.

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional methods.

Bayesian learning for the stochastic shortest path problem

The paper proposes a novel Bayesian framework to learn the optimal decision strategy for the stochastic shortest path problem by directly constructing the posterior beliefs for the action-value function $Q^*$ using Bellman's optimality equations.

Highlighted terms show continued research focus across papers

Papers

stat.MLcs.LGmath.STRecentJun 3, 2026

Bayesian learning for the stochastic shortest path problem

Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo

The paper proposes a novel Bayesian framework to learn the optimal decision strategy for the stochastic shortest path problem by directly constructing the posterior beliefs for the action-value functi…

View →
cs.LGcs.CLRecentJun 2, 2026

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…

View →
cs.CVcs.AIcs.GRRecentMay 31, 2026

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more

The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.

View →
eess.AScs.AIcs.SDRecentMay 29, 2026

A Unified and Reproducible Experimentation Framework for Speech Understanding

Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li +20 more

The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.

View →
cs.CLRecentMay 29, 2026

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…

View →
cs.AIRecentMay 28, 2026

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Kun Feng, Ziwei Shan, Yuchen Fang, Yiyang Tan +5 more

KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multi…

View →
cs.CRcs.AIcs.CLRecentMay 27, 2026

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully…

View →
cs.CRcs.AIcs.CLRecentMay 27, 2026

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully…

View →
cs.CRcs.AIcs.SERecentApr 16, 2026

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

Tyler Tracy, Ram Potham, Nick Kuhn, Myles Heller +30 more

The paper introduces LinuxArena, a large-scale, diverse control setting for testing AI agents in live production environments, demonstrating its utility for evaluating both attack and defense mechanis…

View →