Papers similar to 2604.15499v1

~ similar to 2604.15499v1· 20 results

cs.CRcs.AIRecentApr 17, 2026

Privacy-Preserving LLMs Routing

Xidong Wu, Yukuan Zhang, Yuqiong Ji, Reza Shirkavand +2 more

The paper proposes PPRoute, a privacy-preserving LLM routing framework that significantly speeds up secure model selection while maintaining high performance comparable to non-private methods.

View →

cs.CRRecentApr 11, 2026

EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

Yufan Zhu, Chao Jin, Khin Mi Mi Aung, Xiaokui Xiao

EncFormer is a novel two-party framework that significantly improves the efficiency and scalability of private Transformer inference by optimizing the combination of Fully Homomorphic Encryption (FHE)…

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.CRRecentMar 31, 2026

Beyond Latency: A System-Level Characterization of MPC and FHE for PPML

Pengzhi Huang, Kiwan Maeng, G. Edward Suh

This paper provides a comprehensive, system-level comparison of MPC and FHE for Privacy-Preserving Machine Learning (PPML) across various models and environments, moving beyond single-metric latency a…

View →

cs.LGcs.CRcs.DCRecentMay 8, 2026

Private Vertical Federated Inference for Time-Series

Lucas Fenaux, Larris Xie, Aditya Bang, Alex Zhang +2 more

The paper proposes a Public/Private Hybrid Head-VFL (PPHH-VFL) architecture that significantly accelerates secure time-series inference by splitting the model head into efficient public and secure pri…

View →

cs.CRRecentMay 6, 2026

Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs

Zekun Fei, Zihao Wang, Weijie Liu, Ruiqi He +3 more

Misrouter introduces an input-only adversarial framework to exploit the routing mechanisms of Mixture-of-Experts (MoE) LLMs, enabling unsafe behavior induction against remotely hosted, black-box servi…

View →

cs.CRRecentMay 28, 2026

Protecting On-Device AI Inference: A Systematic Review of Attacks and Defence Mechanisms

Zisis Tsiatsikas, Alexandros Fakis, Georgios Karopoulos, Vasileios Kouliaridis +1 more

This paper provides the first comprehensive review of threats and defenses specifically targeting on-device AI inference, revealing a significant imbalance where certain attack types, like adversarial…

View →

cs.CRcs.AIRecentMar 20, 2026

Meeting in the Middle: A Co-Design Paradigm for FHE and AI Inference

Bernardo Magri, Benjamin Marsh, Paul Gebheim

The paper proposes a co-design paradigm, 'Meeting in the Middle,' to make Fully Homomorphic Encryption (FHE) practical for AI inference by optimizing both the cryptographic schemes and the underlying…

View →

cs.CRcs.AIcs.CLRecentApr 16, 2026

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Haochun Tang, Yuliang Yan, Jiahua Lu, Huaxiao Liu +1 more

The paper introduces R$^2$A, an adversarial attack that uses suffix optimization to mislead black-box LLM routers into consistently selecting expensive, high-capability models.

View →

cs.CRcs.CLcs.LGRecentMay 22, 2026

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

Mingyuan Fan, Yu Liu, Fuyi Wang, Cen Chen

The paper introduces ActInv and PAF to systematically analyze and quantify privacy leakage from intermediate activations during split inference of LLMs, proposing PriPert for enhanced defense.

View →

cs.CRRecentMay 6, 2026

A Pragmatic Comparison of Cryptographic Computation Technologies for Machine Learning

Marcus Taubert, Adam Skuta, Thomas Loruenser

This paper provides a comparative analysis and benchmarking of Secure Multi-Party Computation (SMPC) and Fully Homomorphic Encryption (FHE) for machine learning, finding that the optimal choice depend…

View →

cs.CRcs.ARcs.CLRecentMay 24, 2026

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

Bo Lv, Zhiheng Xu, KeDong Xiu, Ruyi Ding +3 more

RouteScan introduces a non-intrusive framework that audits the safety of Mixture-of-Experts (MoE) LLMs by analyzing low-level GPU expert routing telemetry, achieving high accuracy even on unseen harmf…

View →

cs.CRcs.AIcs.AREmpiricalRecentJul 20, 2026

PRISM: Sensitivity-Aware PolynoMial PRuning for EffIcient Neural Network Encryption

Sahaj Majavdia, Mahdi Taheri

This paper introduces Polynomial-Sensitivity-Aware Pruning (PSAP), a reliability-aware structured pruning method for neural networks under homomorphic encryption, reducing catastrophic layer vulnerabi…

View →

cs.CRcs.AIRecentMay 6, 2026

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu +5 more

This paper demonstrates a novel attack against the shuffling defense used in secure Transformer inference, showing that randomly permuted activations can still be exploited to recover model weights.

View →

cs.LGcs.AIcs.CRRecentMar 17, 2026

NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference

Zhaohui Geoffrey Wang

NANOZK introduces a novel, highly efficient zero-knowledge proof system that allows users to cryptographically verify that the output of a large language model (LLM) was generated by a specific, claim…

View →

cs.CRcs.CLcs.IRRecentMay 27, 2026

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu, Qiongxiu Li

The paper introduces 'Routing Hijacking,' a severe attack where malicious clients forge semantic profiles in Federated RAG systems to misroute target queries, and proposes a trust-aware post-routing f…

View →

cs.CRcs.LGRecentApr 18, 2026

Towards Deep Encrypted Training: Low-Latency, Memory-Efficient, and High-Throughput Inference for Privacy-Preserving Neural Networks

Nges Brian Njungle, Eric Jahns, Michel A. Kinsy

This paper develops optimized algorithms and a pipeline architecture for high-throughput, memory-efficient batch processing of encrypted neural network inference, significantly improving performance o…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →

cs.CRcs.AIcs.DCRecentApr 3, 2026

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong, Ran Ran, Fan Yao, Wujie Wen

AEGIS is a novel system that significantly improves the scalability of running large, long-sequence Transformer models under Fully Homomorphic Encryption (FHE) on multi-GPU systems by optimizing data…

View →

cs.CRcs.NIRecentMay 24, 2026

Securing High-Performance Data Transfers: Implementing AES Encryption in RDMA Systems

Erik Bångsbo, Zakaria Hersi, Anna Benktson, Stefan Holmgren +1 more

This paper proposes and demonstrates a method to secure high-performance RDMA data transfers by implementing AES-128 encryption directly within a programmable network switch, maintaining high throughp…

View →