Wei Yang

11 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×8Crypto×6ML×6NLP×3Software Eng.×1Multiagent×1Info Retrieval×1Vision×1

Frequent co-authors

Wei Yang Bryan Lim2×

Bin Duan1×

Zeyu Bai1×

Guowei Yang1×

Huayi Lai1×

Shichao Song1×

Research Timeline

2026

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

ACRFence introduces a framework-agnostic mitigation to prevent semantic rollback attacks in LLM agents by recording irreversible tool effects and enforcing strict replay-or-fork semantics upon checkpoint restoration.

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

The paper proposes SALO, a novel detector that monitors the dynamic, layer-wise activation pattern (Refusal Trajectory) to improve jailbreak detection robustness compared to traditional methods relying on static terminal representations.

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

AESOP introduces an adversarial attack that targets the entire execution path of deep learning pipelines, demonstrating that path-aware selection can inflate computational costs by orders of magnitude more than single-model attacks.

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-damage detection.

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models

The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly fewer evaluations.

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned recommendations.

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

The paper proposes HetMedAgent, a multi-agent framework, demonstrating that combining generalist LLMs with domain-specific specialist models significantly improves medical AI performance by enabling structured collaboration.

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

ReasonLight is a multimodal foundation model-enhanced RL framework that enables zero-shot traffic signal control by semantically refining RL-proposed actions using heterogeneous sensor and camera data.

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value Decoupling' phenomenon.

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.LGcs.SERecentJun 3, 2026

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

Bin Duan, Zeyu Bai, Guowei Yang

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

View →

cs.AIRecentJun 1, 2026