Bin Yang
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes a novel hardware extension that enables deterministic, kernel-bypass switching to user-level protection domains upon interrupt arrival, significantly reducing worst-case latency for critical real-time systems.
The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal control carrier from late to early layers along a robustness-utility frontier.
The paper introduces VIP-Net, a framework that leverages multi-modal spatio-temporal cues and a new dataset (Temporal-VIP) to accurately identify the most influential people in videos, overcoming the challenge of Temporal Importance Shift (TIS).
Membrane introduces a self-evolving guardrail using Contrastive Safety Memory (CSM) that generalizes across topical jailbreak variants, achieving superior safety performance while minimizing benign refusal rates.
Papers
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
Minseok Choi, Seungbin Yang, Dongjin Kim, Subin Kim +4 more
Membrane introduces a self-evolving guardrail using Contrastive Safety Memory (CSM) that generalizes across topical jailbreak variants, achieving superior safety performance while minimizing benign re…