Hao Wu

15 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×11Crypto×5ML×3NLP×3Vision×1Systems and Control×1Software Eng.×1

Frequent co-authors

Yanzhao Wu2×

Yuhao Wu2×

Yue Zhang2×

Songhao Wu1×

Ang Lv1×

Ruobing Xie1×

Research Timeline

2026

GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

The paper proposes GUARD-SLM, a token activation-based defense mechanism, to enhance the robustness of Small Language Models (SLMs) against various jailbreak attacks by analyzing and filtering malicious patterns in the model's internal representation space.

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

The paper proposes SAGE, a framework that uses Signal-Amplified Guided Embeddings to overcome 'Signal Submersion' in LLMs, significantly boosting vulnerability detection accuracy across multiple programming languages.

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control expert selection.

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints, and proposes U-SPLOIT to automate this attack.

Behavioral Integrity Verification for AI Agent Skills

The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revealing a high rate of behavioral deviation.

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference optimization on the Chain-of-Thought (CoT) process.

Reinforcement Learning with Robust Rubric Rewards

The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detection methods.

Closed-Loop Neural Activation Control in Vision-Language-Action Models

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without retraining the base model.

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLMs.

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.

Policy and World Modeling Co-Training for Language Agents

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent performance.

CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation

The paper introduces CityTrajBench, a unified benchmark framework that standardizes the evaluation of city-scale vehicle trajectory generation, demonstrating that no single generation model dominates all performance metrics.

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CLEmpiricalRecentJun 10, 2026

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

View →

cs.LGcs.AIRecent