Hao Wu
15 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes GUARD-SLM, a token activation-based defense mechanism, to enhance the robustness of Small Language Models (SLMs) against various jailbreak attacks by analyzing and filtering malicious patterns in the model's internal representation space.
The paper proposes SAGE, a framework that uses Signal-Amplified Guided Embeddings to overcome 'Signal Submersion' in LLMs, significantly boosting vulnerability detection accuracy across multiple programming languages.
MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control expert selection.
This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints, and proposes U-SPLOIT to automate this attack.
The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revealing a high rate of behavioral deviation.
The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference optimization on the Chain-of-Thought (CoT) process.
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.
The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detection methods.
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without retraining the base model.
WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLMs.
This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent performance.
The paper introduces CityTrajBench, a unified benchmark framework that standardizes the evaluation of city-scale vehicle trajectory generation, demonstrating that no single generation model dominates all performance metrics.
This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.
Papers
Redesign Mixture-of-Experts Routers with Manifold Power Iteration
This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.