Papers similar to 2606.02333

~ similar to 2606.02333· 19 results

cs.ARcs.MSRecentJun 3, 2026

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.

View →

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.PFcs.ARcs.DCRecentMay 28, 2026

From Roofline to Ruggedness: Decomposing and Smoothing the GEMM Performance Landscape

Aditya Chatterjee

The paper introduces performance ruggedness analysis to quantify performance variance in GEMM workloads, proposing a two-stage software stack that significantly smooths the performance landscape and b…

View →

cs.ARRecentJun 1, 2026

CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

Lorenzo Leone, Philip Wiese, Gamze İslamoğlu, Michael Rogenmoser +3 more

The paper introduces Chimera, a highly efficient and scalable MCU designed for ultra-low-power edge AI inference, achieving 3.1 TOPS/W by integrating a dedicated transformer accelerator and a QoS-guar…

View →

cs.ARcs.CRRecentJun 2, 2026

ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs

Adiwena Putra, Cuong Manh Duong, Anh Quang Pham, Joo-Young Kim

The paper proposes ZK-Flex, a flexible software-hardware co-designed framework that significantly accelerates Zero-Knowledge Proof (ZKP) generation by efficiently handling diverse polynomial and ellip…

View →

cs.ARcs.CRRecentJun 2, 2026

ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs

Adiwena Putra, Cuong Manh Duong, Anh Quang Pham, Joo-Young Kim

View →

cs.ARcs.ETRecentMay 27, 2026

Nonvolatile Charge-Domain Attention with HZO Ferroelectric Capacitors: A Simulation-Based Device-to-System Evaluation

Faris Abouagour

The paper proposes a Ferroelectric Charge-Domain Compute Cell (FCDC) using HZO memcapacitors to perform attention computation, achieving significant energy efficiency gains, especially for long-reside…

View →

cs.ARcs.CLcs.CRRecentApr 20, 2026

Enabling AI ASICs for Zero Knowledge Proof

Jianming Tong, Jingtian Dang, Simon Langowski, Tianhao Huang +5 more

The paper introduces MORPH, a framework that reformulates Zero-Knowledge Proof (ZKP) computations to efficiently utilize AI ASICs like TPUs, achieving up to 10x higher throughput on NTT.

View →

cs.CRcs.ARcs.PFRecentMar 19, 2026

Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

Rojin Chhetri

This paper provides the first systematic, isolated benchmarks of NIST-standardized post-quantum cryptography (ML-KEM and ML-DSA) on the highly constrained ARM Cortex-M0+ processor, showing performance…

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.ARRecentMay 31, 2026

OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs

Denis Lebold, Hendrik Wöhrle

OpenEye is a scalable, sparsity-aware FPGA-based hardware accelerator designed to efficiently execute common deep neural network operations, demonstrating favorable performance-resource trade-offs acr…

View →

cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →

cs.CRcs.OSRecentMay 30, 2026

Beyond Edge Coverage: Per-Task Data-Flow Extraction at Kernel Function Boundaries via LLVM

Yunseong Kim

The paper introduces BOUNDARY FLOW, an LLVM-based framework that enhances kernel fuzzing and analysis by extracting per-task, state-aware data-flow information (arguments and return values) at functio…

View →

cs.CRRecentApr 16, 2026

Structural Dependency Analysis for Masked NTT Hardware: Scalable Pre-Silicon Verification of Post-Quantum Cryptographic Accelerators

Ray Iskander, Khaled Kirah

The paper introduces a four-stage structural dependency analysis hierarchy that enables scalable, sound first-order masking verification for large, production-level post-quantum cryptographic accelera…

View →

cs.CRcs.ARRecentMar 28, 2026

Attacking AI Accelerators by Leveraging Arithmetic Properties of Addition

Masoud Heidary, Biresh Kumar Joardar

The paper introduces a novel hardware aging attack that exploits the commutative properties of addition to induce unbalanced stress on AI accelerator transistors, significantly degrading model accurac…

View →

cs.CRRecentApr 4, 2026

Partial Number Theoretic Transform Masking in Post-Quantum Cryptography (PQC) Hardware: A Security Margin Analysis

Ray Iskander, Khaled Kirah

The paper analyzes the security of a partially masked hardware accelerator for Number Theoretic Transform (NTT) in PQC, demonstrating that the claimed security margins are significantly overestimated…

View →

cs.CRcs.CYcs.DCRecentJun 3, 2026

The Usefulness Gap in Proof-of-Useful-Work: An Empirical Study of Pearl's cuPOW Protocol

Abhinaba Basu

This empirical study of Pearl's cuPOW protocol demonstrates that the network's Proof-of-Useful-Work mechanism generates zero useful AI computation, instead causing economic harm and displacing legitim…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →