Yanzhao Wu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes GUARD-SLM, a token activation-based defense mechanism, to enhance the robustness of Small Language Models (SLMs) against various jailbreak attacks by analyzing and filtering malicious patterns in the model's internal representation space.
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without retraining the base model.
Papers
Closed-Loop Neural Activation Control in Vision-Language-Action Models
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…