~ similar to 2604.04738v1· 20 results
The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…
The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…
The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.
Zi Li, Tian Zhou, Wenze Li, Jingyu Hua +2 more
This paper introduces a novel supply-chain attack that uses model code backdoors to actively steal sensitive secrets from local LLM fine-tuning datasets, bypassing current privacy defenses.
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…
QML-PipeGuard introduces a contract-based framework that monitors the behavioral fingerprint of quantum machine learning pipelines to detect both hardware drift and malicious channel substitution.
The paper proposes evaluating certified training methods by comparing their Pareto fronts across the natural-certified accuracy trade-off, revealing superior performance and previously unappreciated c…
The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
The paper introduces Zero-Run privacy auditing, a post-hoc framework that allows for practical differential privacy evaluation of large, deployed models without requiring retraining or controlled data…
Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more
The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…
QCIVET introduces a novel contract-based framework to ensure the integrity of hybrid quantum-classical pipelines by verifying both the structure (syntactic) and the behavior (semantic) of quantum stag…
Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more
This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…
Sangyeon Yoon, Wonje Jeung, Yoonjun Cho, Dongjae Jeon +1 more
The paper introduces a truly benign Direct Preference Optimization (DPO) attack that can jailbreak large language models (LLMs) by fine-tuning them with minimal, harmless preference data, thereby supp…
Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun +3 more
The paper introduces Safety Bottleneck Regularization (SBR), a novel defense mechanism that anchors LLM safety by constraining the unembedding layer, effectively preventing harmful fine-tuning (HFT) e…
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
The paper introduces Acceptance Cards, a rigorous four-diagnostic standard, to provide a comprehensive and reliable evaluation protocol for claims of safe fine-tuning defenses.
NANOZK introduces a novel, highly efficient zero-knowledge proof system that allows users to cryptographically verify that the output of a large language model (LLM) was generated by a specific, claim…
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…