Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Hao Wu

Hao Wu

15 indexed papers

Recent (6 mo)
15
With code
0
Influential cites
0
Benchmarked
0

Publications per year

15
26

Top categories

AI×11Crypto×5ML×3NLP×3Vision×1Systems and Control×1Software Eng.×1

Frequent co-authors

Yanzhao Wu2×
Yuhao Wu2×
Yue Zhang2×
Songhao Wu1×
Ang Lv1×
Ruobing Xie1×

Research Timeline

2026
GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

The paper proposes GUARD-SLM, a token activation-based defense mechanism, to enhance the robustness of Small Language Models (SLMs) against various jailbreak attacks by analyzing and filtering malicious patterns in the model's internal representation space.

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

The paper proposes SAGE, a framework that uses Signal-Amplified Guided Embeddings to overcome 'Signal Submersion' in LLMs, significantly boosting vulnerability detection accuracy across multiple programming languages.

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control expert selection.

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints, and proposes U-SPLOIT to automate this attack.

Behavioral Integrity Verification for AI Agent Skills

The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revealing a high rate of behavioral deviation.

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference optimization on the Chain-of-Thought (CoT) process.

Reinforcement Learning with Robust Rubric Rewards

The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detection methods.

Closed-Loop Neural Activation Control in Vision-Language-Action Models

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without retraining the base model.

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLMs.

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.

Policy and World Modeling Co-Training for Language Agents

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent performance.

CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation

The paper introduces CityTrajBench, a unified benchmark framework that standardizes the evaluation of city-scale vehicle trajectory generation, demonstrating that no single generation model dominates all performance metrics.

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CLEmpiricalRecentJun 10, 2026

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

View →
cs.LGcs.AIRecent
Jun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →
cs.LGcs.AIRecentJun 1, 2026

CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation

Shibo Zhu, Xiaodan Shi, Dayin Chen, Yuntian Chen +3 more

The paper introduces CityTrajBench, a unified benchmark framework that standardizes the evaluation of city-scale vehicle trajectory generation, demonstrating that no single generation model dominates…

View →
cs.AIRecentMay 31, 2026

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

Xianyou Li, Weiran Yan, Yichao Wu, Penghao Liang +3 more

This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.

View →
cs.CLcs.AIRecentMay 30, 2026

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu +4 more

WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLM…

View →
cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →
cs.CVcs.AIRecentMay 28, 2026

Reinforcement Learning with Robust Rubric Rewards

Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu +14 more

The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robu…

View →
cs.AIRecentMay 28, 2026

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Chen He, Yuhao Wu, Lei Wang, Wenxuan Zhang +1 more

The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.

View →
cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →
cs.AIRecentMay 27, 2026

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

Jiawei Kong, Hao Fang, Shunxiang Liao, Jinyu Li +4 more

The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference o…

View →
cs.CRcs.AIeess.SYRecentMay 12, 2026

Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu, Tung-Ling Li, Hongliang Liu

The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revea…

View →
cs.CRcs.SERecentMay 11, 2026

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Yue Li, Xiao Li, Hao Wu, Yue Zhang +4 more

This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints…

View →
cs.CRRecentApr 30, 2026

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin +1 more

MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control ex…

View →
cs.CRRecentApr 21, 2026

SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection

Zhengyang Shan, Xu Qian, Jiayun Xin, Minghui Xu +4 more

The paper proposes SAGE, a framework that uses Signal-Amplified Guided Embeddings to overcome 'Signal Submersion' in LLMs, significantly boosting vulnerability detection accuracy across multiple progr…

View →
cs.CRcs.AIRecentMar 28, 2026

GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

Md Jueal Mia, Joaquin Molto, Yanzhao Wu, M. Hadi Amini

The paper proposes GUARD-SLM, a token activation-based defense mechanism, to enhance the robustness of Small Language Models (SLMs) against various jailbreak attacks by analyzing and filtering malicio…

View →