Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Gelei Deng

Gelei Deng

11 indexed papers

Recent (6 mo)
11
With code
0
Influential cites
0
Benchmarked
0

Publications per year

11
26

Top categories

Crypto×11AI×10NLP×6Software Eng.×2ML×1Social Networks×1

Frequent co-authors

Yi Liu8×
Yuekang Li8×
Ying Zhang7×
Leo Yu Zhang7×
Yubin Qu4×
Yanjun Zhang4×

Research Timeline

2026
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

The paper identifies that background 'heartbeat' execution in personal AI agents like Claw can silently pollute the agent's memory with external misinformation, influencing user behavior without the user's knowledge or explicit prompt injection.

AutoEG: Exploiting Known Third-Party Vulnerabilities in Black-Box Web Applications

The paper introduces AutoEG, a fully automated multi-agent framework that significantly improves the exploitation of known third-party vulnerabilities in black-box web applications by achieving an 82.41% average success rate.

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demonstrate that malicious code can be embedded in LLM agent skill documentation, allowing supply-chain attacks to hijack agent actions without explicit prompts.

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and resulting in exploitable, persistent secrets.

Membership Inference Attacks Against Video Large Language Models

This paper presents a black-box membership inference attack (MIA) against Video Large Language Models (VideoLLMs), demonstrating that they are vulnerable by analyzing generation behavior across varying decoding temperatures.

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

The paper introduces OverEager-Gen, a new benchmark that measures 'overeager actions'—where coding agents perform unauthorized tasks beyond a benign request—and finds that removing explicit consent declarations significantly increases this overeager behavior across multiple agents.

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

The paper introduces BITE, a black-box adversarial framework that exploits stylistic biases in LLM judges by adaptively generating semantically equivalent edits to artificially inflate assigned scores.

SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents

The paper introduces SNARE, a novel adaptive testing pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variation in security risk.

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current GUI agents.

SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents

The paper introduces SNARE, a novel adaptive benchmarking pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variation in security risk.

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current VLM-driven GUI agents.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIcs.CLRecentMay 27, 2026

SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents

Yubin Qu, Yi Liu, Gelei Deng, Yanjun Zhang +3 more

The paper introduces SNARE, a novel adaptive testing pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variat…

View →
cs.CRcs.AIcs.CLRecentMay 27, 2026

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully…

View →
cs.CRcs.AIcs.CLRecentMay 27, 2026

SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents

Yubin Qu, Yi Liu, Gelei Deng, Yanjun Zhang +3 more

The paper introduces SNARE, a novel adaptive benchmarking pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the v…

View →
cs.CRcs.AIcs.CLRecentMay 27, 2026

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more

The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully…

View →
cs.CRcs.AIcs.LGRecentMay 24, 2026

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

Xianglin Yang, Bryan Hooi, Gelei Deng, Tianwei Zhang +1 more

The paper introduces BITE, a black-box adversarial framework that exploits stylistic biases in LLM judges by adaptively generating semantically equivalent edits to artificially inflate assigned scores…

View →
cs.SEcs.AIcs.CLRecentMay 18, 2026

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Yubin Qu, Ying Zhang, Yanjun Zhang, Gelei Deng +3 more

The paper introduces OverEager-Gen, a new benchmark that measures 'overeager actions'—where coding agents perform unauthorized tasks beyond a benign request—and finds that removing explicit consent de…

View →
cs.CRRecentApr 29, 2026

Membership Inference Attacks Against Video Large Language Models

Wei Song, Yuxin Cao, Ziqi Ding, Yi Liu +2 more

This paper presents a black-box membership inference attack (MIA) against Video Large Language Models (VideoLLMs), demonstrating that they are vulnerable by analyzing generation behavior across varyin…

View →
cs.CRcs.AIcs.CLRecentApr 3, 2026

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng +4 more

The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demonstrate that malicious code can be embedded in LLM agent skill documentation, allowing supply-chain attacks to hijack age…

View →
cs.CRcs.AIRecentApr 3, 2026

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more

This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…

View →
cs.CRcs.AIcs.SERecentApr 1, 2026

AutoEG: Exploiting Known Third-Party Vulnerabilities in Black-Box Web Applications

Ruozhao Yang, Mingfei Cheng, Gelei Deng, Junjie Wang +2 more

The paper introduces AutoEG, a fully automated multi-agent framework that significantly improves the exploitation of known third-party vulnerabilities in black-box web applications by achieving an 82.…

View →
cs.CRcs.AIcs.SIRecentMar 24, 2026

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng +4 more

The paper identifies that background 'heartbeat' execution in personal AI agents like Claw can silently pollute the agent's memory with external misinformation, influencing user behavior without the u…

View →