~ similar to 2606.05627· 19 results
This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.
The paper proposes a constant-time implementation methodology for activation functions on microcontrollers to prevent timing side-channel attacks during embedded neural-network inference.
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.
The paper introduces partial multi-neuron relaxation, a novel verification technique that selectively computes tight linear bounds for a small subset of neurons to improve the efficiency and tightness…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.
Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao +4 more
The paper introduces TIGER, a GPU-accelerated framework that significantly speeds up high-precision evaluation of nonlinear layers for encrypted LLM inference using TFHE.
The paper introduces Logit-aware Final-block Quantization (LFQ), an enhancement to block-wise quantization that quantizes the final Transformer block using a cross-entropy loss to significantly boost…
The paper proposes a decision-aware quadratic replacement for the ReLU activation function, enabling low-degree, calibration-lossless polynomial approximations for neural network inference under Fully…
The paper introduces a four-stage structural dependency analysis hierarchy that enables scalable, sound first-order masking verification for large, production-level post-quantum cryptographic accelera…
The paper presents a highly optimized, low-stack implementation of the HAETAE signature scheme, reducing peak stack usage significantly to enable its use on severely memory-constrained microcontroller…
Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…
The paper proposes a Ferroelectric Charge-Domain Compute Cell (FCDC) using HZO memcapacitors to perform attention computation, achieving significant energy efficiency gains, especially for long-reside…
Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more
This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the core kernels to leverage the Hexagon Vector eXte…
Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc +2 more
This paper optimizes the decoding of Hamming Quasi-Cyclic (HQC) codes for post-quantum cryptography on NPU-integrated mobile devices by redesigning the kernels to leverage the Hexagon Vector eXtension…
The paper proposes a novel space switching method to efficiently unify arithmetic and comparison operations within Fully Homomorphic Encryption (FHE) schemes, achieving significant performance improve…