Papers similar to 2603.20421v2

~ similar to 2603.20421v2· 20 results

cs.CRcs.DCRecentMay 31, 2026

GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

Tiziana Liberati, Nitin Shukla, Matteo Barbieri, Gabriella Bettonte +4 more

This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…

View →

cs.CRcs.LGRecentMay 29, 2026

Bit-Exact AI Inference Verification Without Performance Tradeoffs

Naci Cankaya

The paper proposes a method for bit-exact verification of AI inference outputs without sacrificing performance, demonstrating that deterministic, precise re-computation is possible even across differe…

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.AIcs.CRRecentMar 26, 2026

On the Foundations of Trustworthy Artificial Intelligence

TJ Dunham

The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.

View →

cs.LGcs.AIRecentMay 29, 2026

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

Zaid Khan, Justin Chih-Yao Chen, Jaemin Cho, Elias Stengel-Eskin +1 more

This paper demonstrates that Large Language Models (LLMs) can serve as accurate and selective surrogates for costly GPU kernel performance measurements, significantly expanding the search space for op…

View →

cs.CRcs.AIcs.DCRecentApr 3, 2026

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong, Ran Ran, Fan Yao, Wujie Wen

AEGIS is a novel system that significantly improves the scalability of running large, long-sequence Transformer models under Fully Homomorphic Encryption (FHE) on multi-GPU systems by optimizing data…

View →

cs.CRcs.ARRecentApr 6, 2026

GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao +4 more

The paper introduces TIGER, a GPU-accelerated framework that significantly speeds up high-precision evaluation of nonlinear layers for encrypted LLM inference using TFHE.

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.CRcs.CLRecentApr 28, 2026

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Alex Bogdan, Adrian de Valois-Franklin

The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…

View →

cs.CRRecentApr 2, 2026

AI-Assisted Hardware Security Verification: A Survey and AI Accelerator Case Study

Khan Thamid Hasan, Md Ajoad Hasan, Nashmin Alam, Md. Touhidul Islam +2 more

This survey reviews the integration of AI and LLMs into hardware security verification, demonstrating its potential to automate complex stages while stressing the necessity of grounding AI outputs in…

View →

cs.CRcs.ARRecentMar 28, 2026

Attacking AI Accelerators by Leveraging Arithmetic Properties of Addition

Masoud Heidary, Biresh Kumar Joardar

The paper introduces a novel hardware aging attack that exploits the commutative properties of addition to induce unbalanced stress on AI accelerator transistors, significantly degrading model accurac…

View →

cs.CRcs.CYcs.DCRecentJun 3, 2026

The Usefulness Gap in Proof-of-Useful-Work: An Empirical Study of Pearl's cuPOW Protocol

Abhinaba Basu

This empirical study of Pearl's cuPOW protocol demonstrates that the network's Proof-of-Useful-Work mechanism generates zero useful AI computation, instead causing economic harm and displacing legitim…

View →

cs.CRRecentMay 3, 2026

GPU Fingerprinting for Location Verification

Wayne Tee, Jonathan Happel

The paper proposes using hardware fingerprints instead of vulnerable cryptographic keys to enhance the security and robustness of GPU location verification for governing advanced AI development.

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →

cs.CRRecentMay 5, 2026

GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer

Chris S. Lin, Yuqin Yan, Guozhen Ding, Joyce Qu +3 more

This paper demonstrates a novel GPU-side privilege escalation attack, showing that Rowhammer can be used to target and tamper with page tables to gain unauthorized access to co-tenant memory and ultim…

View →

cs.CVcs.AIRecentMay 29, 2026

SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation

Petros Andreou, Jamie Lanyon, Axel Finke, Georgina Cosma

SUPREME is an open-source, multi-GPU framework designed to efficiently and reproducibly evaluate machine unlearning methods for image classification by distributing computationally intensive tasks acr…

View →

cs.CRcs.DCcs.DSRecentApr 13, 2026

GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs

Lara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar +6 more

The paper proposes a novel, optimized sparse matrix multiplication method for fully homomorphic encrypted deep neural networks, achieving up to a 3.0x speedup on AMD GPUs compared to CPU implementatio…

View →

cs.LOcs.CEcs.ETRecentJun 1, 2026

Federated Formal Verification: Cross-Backend Citation, Cross-Axis Convergence, and AI-Orchestrated Proof Dispatch for Production Systems

Pierre Falda

The paper proposes a federated formal verification architecture that treats verification as a polyglot proof system, successfully validating it on complex production subsystems like a Raft consensus m…

View →

cs.PFcs.ARcs.DCRecentMay 28, 2026

From Roofline to Ruggedness: Decomposing and Smoothing the GEMM Performance Landscape

Aditya Chatterjee

The paper introduces performance ruggedness analysis to quantify performance variance in GEMM workloads, proposing a two-stage software stack that significantly smooths the performance landscape and b…

View →