Wei Yang
11 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
ACRFence introduces a framework-agnostic mitigation to prevent semantic rollback attacks in LLM agents by recording irreversible tool effects and enforcing strict replay-or-fork semantics upon checkpoint restoration.
The paper proposes SALO, a novel detector that monitors the dynamic, layer-wise activation pattern (Refusal Trajectory) to improve jailbreak detection robustness compared to traditional methods relying on static terminal representations.
AESOP introduces an adversarial attack that targets the entire execution path of deep learning pipelines, demonstrating that path-aware selection can inflate computational costs by orders of magnitude more than single-model attacks.
The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-damage detection.
The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly fewer evaluations.
The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned recommendations.
The paper proposes HetMedAgent, a multi-agent framework, demonstrating that combining generalist LLMs with domain-specific specialist models significantly improves medical AI performance by enabling structured collaboration.
ReasonLight is a multimodal foundation model-enhanced RL framework that enables zero-shot traffic signal control by semantically refining RL-proposed actions using heterogeneous sensor and camera data.
The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.
The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value Decoupling' phenomenon.
The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.
Papers
Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks
The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.