Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Wei Yang

Wei Yang

11 indexed papers

Recent (6 mo)
11
With code
0
Influential cites
0
Benchmarked
0

Publications per year

11
26

Top categories

AI×8Crypto×6ML×6NLP×3Software Eng.×1Multiagent×1Info Retrieval×1Vision×1

Frequent co-authors

Wei Yang Bryan Lim2×
Bin Duan1×
Zeyu Bai1×
Guowei Yang1×
Huayi Lai1×
Shichao Song1×

Research Timeline

2026
ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

ACRFence introduces a framework-agnostic mitigation to prevent semantic rollback attacks in LLM agents by recording irreversible tool effects and enforcing strict replay-or-fork semantics upon checkpoint restoration.

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

The paper proposes SALO, a novel detector that monitors the dynamic, layer-wise activation pattern (Refusal Trajectory) to improve jailbreak detection robustness compared to traditional methods relying on static terminal representations.

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

AESOP introduces an adversarial attack that targets the entire execution path of deep learning pipelines, demonstrating that path-aware selection can inflate computational costs by orders of magnitude more than single-model attacks.

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-damage detection.

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models

The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly fewer evaluations.

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned recommendations.

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

The paper proposes HetMedAgent, a multi-agent framework, demonstrating that combining generalist LLMs with domain-specific specialist models significantly improves medical AI performance by enabling structured collaboration.

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

ReasonLight is a multimodal foundation model-enhanced RL framework that enables zero-shot traffic signal control by semantically refining RL-proposed actions using heterogeneous sensor and camera data.

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value Decoupling' phenomenon.

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.LGcs.SERecentJun 3, 2026

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

Bin Duan, Zeyu Bai, Guowei Yang

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

View →
cs.AIRecentJun 1, 2026

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

Huayi Lai, Shichao Song, Simin Niu, Hanyu Wang +4 more

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value…

View →
cs.CLcs.LGRecentMay 30, 2026

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun +2 more

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in r…

View →
cs.AIcs.CLcs.LGRecentMay 28, 2026

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

Yanan Wang, Shuaicong Hu, Jian Liu, Guohui Zhou +2 more

The paper proposes HetMedAgent, a multi-agent framework, demonstrating that combining generalist LLMs with domain-specific specialist models significantly improves medical AI performance by enabling s…

View →
cs.AIRecentMay 28, 2026

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

Aoyu Pang, Maonan Wang, Yuejiao Xie, Chung Shue Chen +2 more

ReasonLight is a multimodal foundation model-enhanced RL framework that enables zero-shot traffic signal control by semantically refining RL-proposed actions using heterogeneous sensor and camera data…

View →
cs.IRcs.AIRecentMay 27, 2026

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo +8 more

The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned reco…

View →
cs.CRcs.AIcs.LGRecentMay 26, 2026

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models

Hayden Helm, Xiaodong Liu, Weiwei Yang

The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly…

View →
cs.LGcs.AIcs.CRRecentMay 9, 2026

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

Tingxi Li, Mingfang Ji, Ravishka Shemal Rathnasuriya, Simin Chen +2 more

AESOP introduces an adversarial attack that targets the entire execution path of deep learning pipelines, demonstrating that path-aware selection can inflate computational costs by orders of magnitude…

View →
cs.CVcs.AIcs.CRRecentMay 9, 2026

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

Xinyu Yan, Boyang Chen, Jiaming Zhang, Tiantong Wu +11 more

The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-da…

View →
cs.CRcs.AIcs.CLRecentMay 2, 2026

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

Xulin Hu, Che Wang, Wei Yang Bryan Lim, Jianbo Gao +1 more

The paper proposes SALO, a novel detector that monitors the dynamic, layer-wise activation pattern (Refusal Trajectory) to improve jailbreak detection robustness compared to traditional methods relyin…

View →
cs.CRRecentMar 21, 2026

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

Yusheng Zheng, Yiwei Yang, Wei Zhang, Andi Quinn

ACRFence introduces a framework-agnostic mitigation to prevent semantic rollback attacks in LLM agents by recording irreversible tool effects and enforcing strict replay-or-fork semantics upon checkpo…

View →