Michael Backes

7 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×7Vision×2NLP×2AI×2

Frequent co-authors

Yang Zhang5×

Yun Shen3×

Xinyue Shen3×

Rui Wen2×

Ziqing Yang1×

Xinlei He1×

Research Timeline

2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, particularly in generating unsafe content and creating harder-to-detect fake images compared to traditional diffusion models.

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and defense mechanisms.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM refusal rates when integrated into agent workflows.

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

The paper proposes SafeReview, a co-evolutionary adversarial training framework that significantly improves the robustness of LLM-based peer review systems against sophisticated adversarial hidden prompts.

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

The PopQuiz Attack is a novel black-box membership inference attack that successfully tests whether large language models memorize specific training data by framing the target data as multiple-choice quiz questions.

Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills

This paper introduces Dependency Steering, a novel attack paradigm demonstrating that malicious agent skills can actively bias LLM coding agents to use attacker-controlled packages, posing a significant, hard-to-detect software supply chain risk.

BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning

The paper introduces BadBone, a stealthy and adaptive backdoor attack that compromises a backbone model specifically to target downstream tasks utilizing prompt learning, demonstrating high attack success rates against state-of-the-art defenses.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.CVRecentMay 29, 2026

BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning

Ziqing Yang, Rui Wen, Xinlei He, Yun Shen +2 more

View →

cs.CRRecentMay 10, 2026