Ting Wang
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces EnsembleSHAP, a novel, computationally efficient, and provably robust feature attribution method specifically designed for the Random Subspace Method to provide secure explanations.
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and explainability.
The paper introduces PIArena, a unified and extensible platform designed to address the lack of standardized evaluation for prompt injection, revealing critical limitations in current state-of-the-art defenses.
The paper introduces FlashRT, a novel framework that significantly improves the computational and memory efficiency of optimization-based red-teaming attacks against long-context LLMs, enabling systematic security evaluation at scale.
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step interactions.
The paper proposes M extsuperscript{3}Att, a knowledge-poisoning framework that injects covert misinformation into medical multimodal RAG systems using paired visual data triggers, demonstrating attacks that generate clinically plausible but incorrect diagnoses.
The paper introduces a diagnostic-driven iterative refinement process for improving LLM-generated reward functions in sparse, structured reinforcement learning tasks, significantly boosting agent performance.
The paper theoretically analyzes the limitations of parameter-based knowledge editing and empirically demonstrates that these methods consistently damage core LLM capabilities compared to retrieval-based baselines.
Papers
Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence
Wanying Ren, Xin Song, Futing Wang, Guoxiu He +1 more
The paper theoretically analyzes the limitations of parameter-based knowledge editing and empirically demonstrates that these methods consistently damage core LLM capabilities compared to retrieval-ba…