Jinbiao Zhu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
DataShield proposes an efficient method to identify safety-degrading samples within benign datasets, quantifying each sample's contribution to an LLM's compliance behavior.
DataShield proposes an efficient method to identify safety-degrading samples within benign datasets, preventing the degradation of LLM safety capabilities during fine-tuning.
Papers
DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning
Junbo Zhang, Qianli Zhou, Xinyang Deng, Wen Jiang +2 more
DataShield proposes an efficient method to identify safety-degrading samples within benign datasets, quantifying each sample's contribution to an LLM's compliance behavior.