Yang Zhang

19 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×10Crypto×9NLP×5ML×3Vision×3Info Retrieval×1Sound×1

Frequent co-authors

Michael Backes5×

Yun Shen3×

Xinyue Shen3×

Yan Wang2×

Yiyang Zhang2×

Chenhao Lin2×

Research Timeline

2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, particularly in generating unsafe content and creating harder-to-detect fake images compared to traditional diffusion models.

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use

This paper introduces Back-Reveal, an attack demonstrating that backdoored LLM agents can systematically exfiltrate sensitive user data by embedding semantic triggers into tool-use mechanisms.

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and defense mechanisms.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM refusal rates when integrated into agent workflows.

TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and safety defenses are highly vulnerable to subtle, compositionally unsafe prompts.

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

The PopQuiz Attack is a novel black-box membership inference attack that successfully tests whether large language models memorize specific training data by framing the target data as multiple-choice quiz questions.

A First Measurement Study on Authentication Security in Real-World Remote MCP Servers

This study provides the first measurement of authentication security in real-world remote Model Context Protocol (MCP) servers, finding pervasive and critical authentication weaknesses, particularly in dynamic client registration.

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

The paper introduces Thinking as Compression (TaC), a novel paradigm showing that the inherent reasoning process of a large language model can naturally compress long context inputs, outperforming dedicated compression methods.

GS-FUSE: Granger-Supervised Gated Fusion and Multi-Granularity Alignment for Event-Driven Financial Forecasting

GS-Fuse is a novel multimodal framework that improves financial forecasting by adaptively fusing event text and price data, achieving state-of-the-art performance by explicitly modeling the directional, causal relationship between events and market movements.

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

The paper proposes CSMR, a cognitive scheduling framework that allows a language model to dynamically decide when to acquire task-relevant visual evidence, significantly improving multimodal reasoning accuracy.

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of Large Language Models (LLMs).

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

This paper analyzes multi-model self-consuming training, showing that while human curation helps individual models, cross-model interactions can degrade long-term alignment by dampening or inverting the positive effects of human input.

Preference-Aware Rubric Learning for Personalized Evaluation

The paper introduces PARL, a framework that learns personalized evaluation rubrics directly from raw user interaction histories to accurately assess how well LLM outputs align with subjective, user-specific preferences.

BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning

The paper introduces BadBone, a stealthy and adaptive backdoor attack that compromises a backbone model specifically to target downstream tasks utilizing prompt learning, demonstrating high attack success rates against state-of-the-art defenses.

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto large shared foundation models.

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model designed for comprehensive understanding of speech, environmental sounds, and music, achieving strong performance across various audio-grounded tasks.

CARTE: A Benchmark for Mapping Language Model Knowledge Across France

The paper introduces CARTE, a new benchmark designed to test how well large language models understand fine-grained, regionally differentiated knowledge across the 13 metropolitan regions of France, revealing systematic gaps in current LLM training.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Highlighted terms show continued research focus across papers

Papers

cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

View →

cs.LGcs.CLRecentJun 1, 2026