~ similar to 2605.07961v1· 20 results
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…
The paper proposes DP-LAC, a novel lightweight adaptive clipping technique for differentially private federated fine-tuning, which efficiently estimates and adapts the clipping threshold without consu…
FedSpy-LLM introduces a scalable and generalizable data reconstruction attack that can extract private training data from shared gradients of large language models, even when using Parameter-Efficient…
SafeLM is a comprehensive framework that jointly addresses privacy, security, misinformation, and adversarial robustness in federated LLMs, achieving high safety performance while significantly reduci…
Jinghuai Zhang, Yetian He, Kunlin Cai, Han Zhao +2 more
RogueMerge introduces a unified framework to robustly attack LLM model merging by addressing the challenges of autoregressive decoding, unknown merging configurations, and prompt generalization, signi…
Jiaqing Li, Zhibo Zhang, Shide Zhou, Yuxi Li +2 more
The paper introduces TrojanMerge, a framework demonstrating that model merging can be exploited to systematically compromise the safety alignment of multiple individually safe LLMs.
FedAttr introduces a novel client-level attribution protocol for Federated Learning (FL) that accurately identifies which clients trained on watermarked data while maintaining strong privacy guarantee…
Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen +5 more
The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and def…
Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more
The paper introduces GEO-Bench, a unified benchmark that standardizes the evaluation of various generative engine optimization (GEO) ranking manipulation attacks, demonstrating that black-box content…
Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more
GEO-Bench introduces a standardized benchmark to compare various ranking manipulation attacks (both black-box and white-box) on generative engines, demonstrating that black-box content rewriting can b…
This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…
FedDetox introduces a robust framework that sanitizes toxic data on edge devices during federated learning to maintain the safety alignment of Small Language Models (SLMs) without sacrificing utility.
This paper demonstrates that LLM cascade systems, designed for efficiency, are vulnerable to targeted adversarial attacks that simultaneously degrade both performance and cost-efficiency.
Haichao Sha, Zihao Wang, Yuncheng Wu, Hong Chen +1 more
The paper proposes DP-SelFT, a novel framework for differentially private selective fine-tuning that significantly improves the privacy-utility trade-off for LLMs by intelligently selecting robust par…
Yige Liu, Dexuan Xu, Zimai Guo, Yongzhi Cao +1 more
This paper analyzes label inference attacks in Vertical Federated Learning (VFL), demonstrating that existing attacks rely on feature-label distribution alignment, and proposes a zero-overhead defense…
Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…
The paper introduces Rotated Robustness (RoR), a training-free defense that uses orthogonal transformations to prevent catastrophic model collapse in LLMs caused by hardware bit-flip attacks.
Yanming Mu, Hao Hu, Feiyang Li, Qiao Yuan +6 more
This paper provides the first comprehensive, end-to-end survey dedicated to the security of Retrieval-Augmented Generation (RAG) systems, systematically mapping threats, defenses, and benchmarks acros…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…