Papers similar to 2606.01450

~ similar to 2606.01450· 19 results

cs.ARRecentMay 28, 2026

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh, Gregor Schiele

The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…

View →

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.CRcs.AIcs.LGRecentMay 21, 2026

Characterizing the Fault Response of the Intel Neural Compute Stick 2 Under Single-Pulse Electromagnetic Fault Injection

Štefan Kučerák, Jakub Breier, Xiaolu Hou

The paper systematically characterizes the fault response of the Intel NCS2 accelerator to electromagnetic fault injection, revealing a major degradation mode that is undetectable by standard inferenc…

View →

cs.CRcs.ETRecentMay 9, 2026

Hardware-Accelerated Line-Rate Bitstream Screening for Secure FPGA Reconfiguration

Rye Stahle-Smith, Carter Antley, Jason D. Bakos, Rasha Karakchi

The paper introduces BLADEI, a hardware-accelerated framework that screens FPGA configuration bitstreams for anomalies in real-time, overcoming the latency bottleneck of traditional software-based det…

View →

cs.CRRecentApr 24, 2026

Secure eFPGA-Enabled Edge LLM Inference: Architectural and Hardware Countermeasures

Voktho Das, M Zafir Sadik Khan, Jafar Vafaei, Kimia Azar +1 more

The paper proposes a hybrid ASIC+eFPGA architecture to enhance the security and resilience of edge LLM inference accelerators against both runtime and supply-chain attacks.

View →

cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →

cs.CRRecentApr 9, 2026

Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh +2 more

This paper introduces the first backdoor attacks against VLM-based scanpath prediction, demonstrating variable-output attacks that evade detection and survive deployment on edge devices.

View →

cs.CRcs.AIcs.LGRecentMar 26, 2026

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri

This paper introduces a dual-layer side-channel attack framework that exploits the variable workload introduced by dynamic image preprocessing in local Vision-Language Models (VLMs) to infer sensitive…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →

cs.CRcs.DCRecentMay 31, 2026

GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

Tiziana Liberati, Nitin Shukla, Matteo Barbieri, Gabriella Bettonte +4 more

This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…

View →

cs.ARcs.AIcs.DCRecentMay 28, 2026

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Josef Chen

Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…

View →

cs.CRcs.LGRecentApr 18, 2026

Towards Deep Encrypted Training: Low-Latency, Memory-Efficient, and High-Throughput Inference for Privacy-Preserving Neural Networks

Nges Brian Njungle, Eric Jahns, Michel A. Kinsy

This paper develops optimized algorithms and a pipeline architecture for high-throughput, memory-efficient batch processing of encrypted neural network inference, significantly improving performance o…

View →

cs.CLRecentJun 1, 2026

DFlare: Scaling Up Draft Capacity for Block Diffusion Speculative Decoding

Jiebin Zhang, Zhenghan Yu, Song Liu, Eugene J. Yu +8 more

DFlare introduces a lightweight layer-wise fusion mechanism to overcome the narrow conditioning bottleneck of existing block diffusion methods, enabling the scaling of draft models and achieving super…

View →

cs.LGcs.AIRecentMay 29, 2026

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen +3 more

BudgetDraft introduces an acceptance-aware multi-view training method that trains a sparse-KV speculative decoder to maintain high acceptance rates across varying context lengths and sparsity levels,…

View →

cs.LGcs.AIcs.ARRecentJun 3, 2026

Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication

Yuyang Du, Yujun Huang, Gioele Zardini

This paper presents a unified framework for end-to-end co-design of neural network processors.

View →

cs.CRRecentApr 21, 2026

A Data-Free Membership Inference Attack on Federated Learning in Hardware Assurance

Gijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis, Reiner N. Dizon-Paradis +3 more

This paper presents a novel data-free Membership Inference Attack (MIA) that uses gradient inversion on Standard Cell Library Layouts (SCLLs) to reconstruct sensitive hardware images from intercepted…

View →

cs.ARcs.AIcs.NERecentJun 4, 2026

ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training

Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more

This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).

View →

cs.ARcs.ETRecentJun 4, 2026

FQA: A Full-Space Quantization-Driven Architecture for Hardware-Efficient Piecewise Approximation of Nonlinear Activation Functions

Chenjun Hao, Feng Yan, Hongbing Pan, Yuxuan Wang

This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…

View →

cs.SDcs.AReess.ASRecentJun 2, 2026

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

Feyisayo Olalere, Umut Altin, Kiki van der Heijden, Marcel van Gerven

This paper characterizes the gap between current DNN-based speech enhancement systems and hearing aid constraints, and proposes a lightweight architecture to meet these constraints.

View →