~ similar to 2605.28451· 19 results
This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.
O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.
HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.
mmFHE introduces the first system enabling end-to-end mmWave radar sensing using fully homomorphic encryption (FHE), allowing sensitive data processing on untrusted cloud infrastructure while maintain…
Xinjue Wang, Xiuheng Wang, Yejun Zhang, Sergiy A. Vorobyov +2 more
The paper investigates whether using fine-grained, tensorized adapters (CP components) instead of standard LoRA ranks improves the accuracy-budget trade-off in PEFT, finding that while they fill budge…
Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…
The paper proposes a Ferroelectric Charge-Domain Compute Cell (FCDC) using HZO memcapacitors to perform attention computation, achieving significant energy efficiency gains, especially for long-reside…
This paper provides the first systematic, isolated benchmarks of NIST-standardized post-quantum cryptography (ML-KEM and ML-DSA) on the highly constrained ARM Cortex-M0+ processor, showing performance…
The paper introduces the base-m length codec, a canonical and robust encoding scheme that maps byte strings to lists of residues modulo m, essential for finite-ring cryptosystems.
This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…
The paper analyzes the security of a partially masked hardware accelerator for Number Theoretic Transform (NTT) in PQC, demonstrating that the claimed security margins are significantly overestimated…
The paper analyzes the structured CVP distance on the log-unit lattice of cyclotomic fields, significantly reducing the conjectured CDPR factor for the ML-KEM cryptosystem from exponential to sub-poly…
This paper provides an explicit cost analysis of Toom-4 multiplication specifically tailored for the incomplete Number Theoretic Transform (NTT) framework, offering a concrete cost model for hybrid la…
The paper introduces a four-stage structural dependency analysis hierarchy that enables scalable, sound first-order masking verification for large, production-level post-quantum cryptographic accelera…
The paper introduces the linear canonical Riesz potential (LCRP) and analyzes its convergence properties, leveraging these findings to propose a novel, secure, and efficient asymmetric cascaded LCRP m…
This paper empirically evaluates the performance of the Polars DataFrame engine running within Intel SGX2 enclaves, finding that while the overall security overhead is manageable, the performance is s…
This paper demonstrates that neural operators used in digital twins for nuclear systems are highly vulnerable to undetectable, sparse adversarial perturbations, necessitating new robustness guarantees…
Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao +4 more
The paper introduces TIGER, a GPU-accelerated framework that significantly speeds up high-precision evaluation of nonlinear layers for encrypted LLM inference using TFHE.
This paper introduces a dual-layer side-channel attack framework that exploits the variable workload introduced by dynamic image preprocessing in local Vision-Language Models (VLMs) to infer sensitive…