~ similar to 2603.20421v2· 20 results
This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…
The paper proposes a method for bit-exact verification of AI inference outputs without sacrificing performance, demonstrating that deterministic, precise re-computation is possible even across differe…
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.
This paper demonstrates that Large Language Models (LLMs) can serve as accurate and selective surrogates for costly GPU kernel performance measurements, significantly expanding the search space for op…
AEGIS is a novel system that significantly improves the scalability of running large, long-sequence Transformer models under Fully Homomorphic Encryption (FHE) on multi-GPU systems by optimizing data…
Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao +4 more
The paper introduces TIGER, a GPU-accelerated framework that significantly speeds up high-precision evaluation of nonlinear layers for encrypted LLM inference using TFHE.
The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…
The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…
This survey reviews the integration of AI and LLMs into hardware security verification, demonstrating its potential to automate complex stages while stressing the necessity of grounding AI outputs in…
The paper introduces a novel hardware aging attack that exploits the commutative properties of addition to induce unbalanced stress on AI accelerator transistors, significantly degrading model accurac…
This empirical study of Pearl's cuPOW protocol demonstrates that the network's Proof-of-Useful-Work mechanism generates zero useful AI computation, instead causing economic harm and displacing legitim…
The paper proposes using hardware fingerprints instead of vulnerable cryptographic keys to enhance the security and robustness of GPU location verification for governing advanced AI development.
Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more
The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…
Chris S. Lin, Yuqin Yan, Guozhen Ding, Joyce Qu +3 more
This paper demonstrates a novel GPU-side privilege escalation attack, showing that Rowhammer can be used to target and tamper with page tables to gain unauthorized access to co-tenant memory and ultim…
SUPREME is an open-source, multi-GPU framework designed to efficiently and reproducibly evaluate machine unlearning methods for image classification by distributing computationally intensive tasks acr…
The paper proposes a novel, optimized sparse matrix multiplication method for fully homomorphic encrypted deep neural networks, achieving up to a 3.0x speedup on AMD GPUs compared to CPU implementatio…
The paper proposes a federated formal verification architecture that treats verification as a polyglot proof system, successfully validating it on complex production subsystems like a Raft consensus m…
The paper introduces performance ruggedness analysis to quantify performance variance in GEMM workloads, proposing a two-stage software stack that significantly smooths the performance landscape and b…