Papers similar to 2604.04738v1

~ similar to 2604.04738v1· 20 results

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.AIcs.CRRecentMar 26, 2026

On the Foundations of Trustworthy Artificial Intelligence

TJ Dunham

The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.

View →

cs.CRcs.AIRecentApr 30, 2026

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li, Jingyu Hua +2 more

This paper introduces a novel supply-chain attack that uses model code backdoors to actively steal sensitive secrets from local LLM fine-tuning datasets, bypassing current privacy defenses.

View →

cs.CRcs.AIcs.LGRecentMay 24, 2026

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya

This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…

View →

quant-phcs.CRcs.LGRecentMay 24, 2026

QML-PipeGuard: Drift-Aware Behavioral Fingerprinting for Quantum Machine Learning Pipeline Integrity

Esra Yeniaras

QML-PipeGuard introduces a contract-based framework that monitors the behavioral fingerprint of quantum machine learning pipelines to detect both hardware drift and malicious channel substitution.

View →

cs.LGcs.AIcs.CVRecentJun 1, 2026

Rethinking Evaluation Paradigms in IBP-based Certified Training

Konstantin Kaulen, Hadar Shavit, Holger H. Hoos

The paper proposes evaluating certified training methods by comparing their Pareto fronts across the natural-certified accuracy trade-off, revealing superior performance and previously unappreciated c…

View →

cs.CRRecentApr 8, 2026

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

Adrian Shuai Li, Md Ajwad Akil, Elisa Bertino

The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.CRRecentMar 30, 2026

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos

The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…

View →

cs.CRRecentMay 14, 2026

Privacy Auditing with Zero (0) Training Run

Tudor Cebere, Mathieu Even, Linus Bleistein, Aurélien Bellet

The paper introduces Zero-Run privacy auditing, a post-hoc framework that allows for practical differential privacy evaluation of large, deployed models without requiring retraining or controlled data…

View →

cs.LGcs.CLcs.CRRecentApr 29, 2026

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more

The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…

View →

quant-phcs.CRRecentMay 13, 2026

QCIVET: A Quantum--Classical Pipeline Integrity Framework with Contract-Based Subtype Verification and Hash-Chained Audit Traces

Esra Yeniaras, Muhammad Amin Karimov

QCIVET introduces a novel contract-based framework to ensure the integrity of hybrid quantum-classical pipelines by verifying both the structure (syntactic) and the behavior (semantic) of quantum stag…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more

This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…

View →

cs.CRcs.AIRecentMay 9, 2026

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs

Sangyeon Yoon, Wonje Jeung, Yoonjun Cho, Dongjae Jeon +1 more

The paper introduces a truly benign Direct Preference Optimization (DPO) attack that can jailbreak large language models (LLMs) by fine-tuning them with minimal, harmless preference data, thereby supp…

View →

cs.CRcs.AIcs.CLRecentMay 7, 2026

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Guoxin Lu, Letian Sha, Qing Wang, Peijie Sun +3 more

The paper introduces Safety Bottleneck Regularization (SBR), a novel defense mechanism that anchors LLM safety by constraining the unembedding layer, effectively preventing harmful fine-tuning (HFT) e…

View →

cs.CCcs.LGcs.LORecentMay 28, 2026

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

Eric Alsmann, Martin Lange, Marco Sälzer

This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…

View →

cs.CRcs.AIcs.LGRecentMay 11, 2026

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

Phongsakon Mark Konrad, Toygar Tanyel, Serkan Ayvaz

The paper introduces Acceptance Cards, a rigorous four-diagnostic standard, to provide a comprehensive and reliable evaluation protocol for claims of safe fine-tuning defenses.

View →

cs.LGcs.AIcs.CRRecentMar 17, 2026

NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference

Zhaohui Geoffrey Wang

NANOZK introduces a novel, highly efficient zero-knowledge proof system that allows users to cryptographically verify that the output of a large language model (LLM) was generated by a specific, claim…

View →

cs.CRcs.CLcs.DCRecentApr 27, 2026

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more

This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…

View →