Yanting Wang
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces EnsembleSHAP, a novel, computationally efficient, and provably robust feature attribution method specifically designed for the Random Subspace Method to provide secure explanations.
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and explainability.
The paper introduces PIArena, a unified and extensible platform designed to address the lack of standardized evaluation for prompt injection, revealing critical limitations in current state-of-the-art defenses.
The paper introduces FlashRT, a novel framework that significantly improves the computational and memory efficiency of optimization-based red-teaming attacks against long-context LLMs, enabling systematic security evaluation at scale.
Papers
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption
The paper introduces FlashRT, a novel framework that significantly improves the computational and memory efficiency of optimization-based red-teaming attacks against long-context LLMs, enabling system…