~ similar to 2604.16606v1· 20 results
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…
Jeongho Yoon, Chanhee Park, Yongchan Chun, Hyeonseok Moon +1 more
The paper introduces Privacy-Preserving Fine-Tuning (PPFT), a novel two-stage pipeline that allows LLMs to process sensitive data via pooled embeddings rather than raw text, achieving a strong balance…
FedDetox introduces a robust framework that sanitizes toxic data on edge devices during federated learning to maintain the safety alignment of Small Language Models (SLMs) without sacrificing utility.
Zhihao Liu, Yifan Wu, Jian Lou, Di Wang +2 more
The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining u…
Hao Li, Jingkun An, Zijun Song, Pengyu Zhu +7 more
SafeSteer proposes a localized on-policy distillation method that restricts safety alignment to specific safety tokens, thereby achieving strong safety performance with minimal degradation to general…
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more
This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…
Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du +2 more
SharedRequest introduces a model-agnostic framework that enhances LLM privacy and efficiency by batching and mixing prompts with noisy variants, achieving high utility and significant cost reduction.
SafeTune is a framework that enhances the robustness of LLMs fine-tuned for RTL code generation by detecting and mitigating data poisoning attacks, particularly those aiming to insert hardware Trojans…
The paper proposes DP-LAC, a novel lightweight adaptive clipping technique for differentially private federated fine-tuning, which efficiently estimates and adapts the clipping threshold without consu…
Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong +3 more
The paper proposes an Augmented Model maniPulation (AugMP) strategy, utilizing graph representation learning, to effectively and stealthily manipulate federated fine-tuning of LLMs, significantly degr…
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu +7 more
The paper introduces SafeHarness, a novel, lifecycle-integrated security architecture that significantly reduces unsafe behavior and attack success rates in LLM agents by weaving multiple defense laye…
FedSpy-LLM introduces a scalable and generalizable data reconstruction attack that can extract private training data from shared gradients of large language models, even when using Parameter-Efficient…
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou +4 more
SafeHarbor is a novel, hierarchical memory-augmented framework that establishes context-aware decision boundaries for LLM agents, achieving state-of-the-art safety while minimizing over-refusal.
This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…
GLiGuard introduces a compact, schema-conditioned bidirectional encoder that achieves state-of-the-art performance in LLM content moderation across multiple safety dimensions while drastically reducin…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…
Guanlong Wu, Zhaohan li, Yao Zhang, Zheng Zhang +3 more
CachePrune introduces a privacy-aware, fine-grained KV cache sharing mechanism that allows LLM inference systems to safely reuse cache entries across users' requests, significantly improving efficienc…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen +5 more
The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and def…