~ similar to 2606.01450· 19 results
The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…
Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…
The paper systematically characterizes the fault response of the Intel NCS2 accelerator to electromagnetic fault injection, revealing a major degradation mode that is undetectable by standard inferenc…
The paper introduces BLADEI, a hardware-accelerated framework that screens FPGA configuration bitstreams for anomalies in real-time, overcoming the latency bottleneck of traditional software-based det…
Voktho Das, M Zafir Sadik Khan, Jafar Vafaei, Kimia Azar +1 more
The paper proposes a hybrid ASIC+eFPGA architecture to enhance the security and resilience of edge LLM inference accelerators against both runtime and supply-chain attacks.
O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.
Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh +2 more
This paper introduces the first backdoor attacks against VLM-based scanpath prediction, demonstrating variable-output attacks that evade detection and survive deployment on edge devices.
This paper introduces a dual-layer side-channel attack framework that exploits the variable workload introduced by dynamic image preprocessing in local Vision-Language Models (VLMs) to infer sensitive…
Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…
This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…
Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…
This paper develops optimized algorithms and a pipeline architecture for high-throughput, memory-efficient batch processing of encrypted neural network inference, significantly improving performance o…
Jiebin Zhang, Zhenghan Yu, Song Liu, Eugene J. Yu +8 more
DFlare introduces a lightweight layer-wise fusion mechanism to overcome the narrow conditioning bottleneck of existing block diffusion methods, enabling the scaling of draft models and achieving super…
Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen +3 more
BudgetDraft introduces an acceptance-aware multi-view training method that trains a sparse-KV speculative decoder to maintain high acceptance rates across varying context lengths and sparsity levels,…
This paper presents a unified framework for end-to-end co-design of neural network processors.
This paper presents a novel data-free Membership Inference Attack (MIA) that uses gradient inversion on Standard Cell Library Layouts (SCLLs) to reconstruct sensitive hardware images from intercepted…
Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more
This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).
This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…
This paper characterizes the gap between current DNN-based speech enhancement systems and hearing aid constraints, and proposes a lightweight architecture to meet these constraints.