Papers similar to 2606.05627

~ similar to 2606.05627· 19 results

cs.ARcs.MSRecentJun 3, 2026

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.

View →

cs.CRcs.AIRecentMay 21, 2026

A Constant-Time Implementation Methodology for Activation Functions on Microcontrollers

Andrii Tyvodar, Andreas Rechberger, Dirmanto Jap, Shivam Bhasin +3 more

The paper proposes a constant-time implementation methodology for activation functions on microcontrollers to prevent timing side-channel attacks during embedded neural-network inference.

View →

cs.CCcs.LGcs.LORecentMay 28, 2026

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

Eric Alsmann, Martin Lange, Marco Sälzer

This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…

View →

cs.LGcs.AIcs.CCRecentMay 28, 2026

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.

View →

cs.LOcs.AIRecentMay 28, 2026

Neural Network Verification using Partial Multi-Neuron Relaxation

Ido Shmuel, Guy Katz

The paper introduces partial multi-neuron relaxation, a novel verification technique that selectively computes tight linear bounds for a small subset of neurons to improve the efficiency and tightness…

View →

cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →

cs.LGcs.AIRecentMay 28, 2026

HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization

Artur Zagitov, Gleb Molodtsov, Aleksandr Beznosikov

HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.

View →

cs.CRcs.ARRecentApr 6, 2026

GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao +4 more

The paper introduces TIGER, a GPU-accelerated framework that significantly speeds up high-precision evaluation of nonlinear layers for encrypted LLM inference using TFHE.

View →

cs.AIRecentMay 28, 2026

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

Jung Hyun Lee, June Yong Yang, Jungwook Choi, Eunho Yang

The paper introduces Logit-aware Final-block Quantization (LFQ), an enhancement to block-wise quantization that quantizes the final Transformer block using a cross-entropy loss to significantly boost…

View →

cs.CRcs.LGRecentMay 21, 2026

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

Rui Li, Wenyuan Wu, Weijie Miao

The paper proposes a decision-aware quadratic replacement for the ReLU activation function, enabling low-degree, calibration-lossless polynomial approximations for neural network inference under Fully…

View →

cs.CRRecentApr 16, 2026

Structural Dependency Analysis for Masked NTT Hardware: Scalable Pre-Silicon Verification of Post-Quantum Cryptographic Accelerators

Ray Iskander, Khaled Kirah

The paper introduces a four-stage structural dependency analysis hierarchy that enables scalable, sound first-order masking verification for large, production-level post-quantum cryptographic accelera…

View →

cs.CRRecentApr 17, 2026

Low-Stack HAETAE for Memory-Constrained Microcontrollers

Gustavo Banegas, Kim Youngbeom, Seo Seog Chung, Vredendaal Christine Van

The paper presents a highly optimized, low-stack implementation of the HAETAE signature scheme, reducing peak stack usage significantly to enable its use on severely memory-constrained microcontroller…

View →

cs.ARcs.AIcs.DCRecentMay 28, 2026

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Josef Chen

Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…

View →

cs.CVcs.AIcs.LGRecentMay 27, 2026

Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study

Sudip Vhaduri, Ryan Gammon, Sayanton Dibbo

This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…

View →

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.ARcs.ETRecentMay 27, 2026

Nonvolatile Charge-Domain Attention with HZO Ferroelectric Capacitors: A Simulation-Based Device-to-System Evaluation

Faris Abouagour

The paper proposes a Ferroelectric Charge-Domain Compute Cell (FCDC) using HZO memcapacitors to perform attention computation, achieving significant energy efficiency gains, especially for long-reside…

View →

cs.CRcs.ARcs.PFRecentJun 1, 2026

Implementation and Optimization of HQC Decoding on NPU-Integrated Devices

Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more

This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the core kernels to leverage the Hexagon Vector eXte…

View →

cs.CRcs.ARcs.PFRecentJun 1, 2026

Implementation and Optimization of HQC Decoding on NPU-Integrated Devices

Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more

This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the kernels to leverage the Hexagon Vector eXtension…

View →

cs.CRRecentApr 21, 2026

Efficient Arithmetic-and-Comparison Homomorphic Encryption with Space Switching

Erwin Eko Wahyudi, Yan Solihin, Qian Lou

The paper proposes a novel space switching method to efficiently unify arithmetic and comparison operations within Fully Homomorphic Encryption (FHE) schemes, achieving significant performance improve…

View →