Zhijing Jin
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Incremental Completion Decomposition (ICD), a novel jailbreak strategy that successfully bypasses LLM safety mechanisms by eliciting malicious content through a sequence of single-word continuations.
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.
Papers
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.