Kai Zhang
8 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
SelfGrader proposes a lightweight, robust guardrail for detecting LLM jailbreaks by formulating the detection problem as a numerical grading task using anchored token-level logits, achieving strong performance across various benchmarks.
XMark introduces a novel multi-bit watermarking technique that reliably embeds binary messages into LLM-generated text while maintaining high text quality and robust performance even with limited token context.
ClawGuard introduces a passive, out-of-band security monitor that detects LLM agent workflow hijacking by analyzing unique electromagnetic (EM) emanations generated during agent skill execution.
The paper introduces MT-JailBench, a modular framework for evaluating multi-turn jailbreaks, demonstrating that controlling experimental components like prompt generation and resource budgets is crucial for fair comparison and understanding attack success.
The paper introduces Causal Editing (CODE), a new paradigm that improves knowledge updates in LLMs by grounding fact injection in causal narratives, drastically reducing self-refutation rates.
The paper introduces CardioLens, a rigorous evaluation testbed for multi-sequence Cardiac MRI, which reveals that current Multimodal Large Language Models (MLLMs) exhibit a significant 'clinical reality gap' and perform poorly when simulating real-world cardiac interpretation workflows.
The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.
Papers
From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents
Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.