Papers similar to 2606.02358

~ similar to 2606.02358· 19 results

cs.CRRecentApr 24, 2026

Secure eFPGA-Enabled Edge LLM Inference: Architectural and Hardware Countermeasures

Voktho Das, M Zafir Sadik Khan, Jafar Vafaei, Kimia Azar +1 more

The paper proposes a hybrid ASIC+eFPGA architecture to enhance the security and resilience of edge LLM inference accelerators against both runtime and supply-chain attacks.

View →

cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →

cs.CRcs.ETRecentMay 9, 2026

Hardware-Accelerated Line-Rate Bitstream Screening for Secure FPGA Reconfiguration

Rye Stahle-Smith, Carter Antley, Jason D. Bakos, Rasha Karakchi

The paper introduces BLADEI, a hardware-accelerated framework that screens FPGA configuration bitstreams for anomalies in real-time, overcoming the latency bottleneck of traditional software-based det…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →

cs.CRcs.DCRecentMar 24, 2026

n-VM: A Multi-VM Layer-1 Architecture with Shared Identity and Token State

Jian Sheng Wang

The paper proposes n-VM, a novel Layer-1 architecture that unifies multiple heterogeneous virtual machines (VMs) onto a shared consensus and state layer, solving cross-chain fragmentation issues.

View →

cs.CRcs.AIcs.DCRecentApr 3, 2026

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong, Ran Ran, Fan Yao, Wujie Wen

AEGIS is a novel system that significantly improves the scalability of running large, long-sequence Transformer models under Fully Homomorphic Encryption (FHE) on multi-GPU systems by optimizing data…

View →

cs.ARcs.ETRecentMay 27, 2026

Nonvolatile Charge-Domain Attention with HZO Ferroelectric Capacitors: A Simulation-Based Device-to-System Evaluation

Faris Abouagour

The paper proposes a Ferroelectric Charge-Domain Compute Cell (FCDC) using HZO memcapacitors to perform attention computation, achieving significant energy efficiency gains, especially for long-reside…

View →

cs.CRcs.ARcs.PFRecentMar 19, 2026

Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

Rojin Chhetri

This paper provides the first systematic, isolated benchmarks of NIST-standardized post-quantum cryptography (ML-KEM and ML-DSA) on the highly constrained ARM Cortex-M0+ processor, showing performance…

View →

cs.ARRecentMay 28, 2026

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh, Gregor Schiele

The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.CRRecentMay 18, 2026

Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators

Datta Manikanta Sri Hari Danduri, Aravind Kumar Machiry

This paper investigates Confused Deputy Attacks (CDAs) on AI Accelerators (AIAs) and finds that CDA is feasible on most major vendor AIAs, impacting a vast number of devices.

View →

cs.ARcs.AIcs.DCRecentMay 28, 2026

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Josef Chen

Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…

View →

cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →

cs.CRcs.AIcs.LGRecentMay 21, 2026

Characterizing the Fault Response of the Intel Neural Compute Stick 2 Under Single-Pulse Electromagnetic Fault Injection

Štefan Kučerák, Jakub Breier, Xiaolu Hou

The paper systematically characterizes the fault response of the Intel NCS2 accelerator to electromagnetic fault injection, revealing a major degradation mode that is undetectable by standard inferenc…

View →

cs.ARcs.CLcs.CRRecentApr 20, 2026

Enabling AI ASICs for Zero Knowledge Proof

Jianming Tong, Jingtian Dang, Simon Langowski, Tianhao Huang +5 more

The paper introduces MORPH, a framework that reformulates Zero-Knowledge Proof (ZKP) computations to efficiently utilize AI ASICs like TPUs, achieving up to 10x higher throughput on NTT.

View →

cs.DCcs.AIcs.LGRecentMay 31, 2026

Lodestar: An Online-Learning LLM Inference Router

Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan +2 more

Lodestar is a novel online learning-based request routing system that significantly improves LLM inference efficiency by dynamically assigning incoming requests to the optimal GPU instance to minimize…

View →

cs.CRcs.AIcs.DCRecentMay 31, 2026

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations

Thamilvendhan Munirathinam

The paper introduces memorywire, a vendor-neutral JSON-Schema wire format and reference implementation designed to standardize and govern memory operations across disparate agent-memory frameworks.

View →

cs.ARRecentMay 29, 2026

A Reconfigurable Computing In-Memory Macro with Charge-sharing-based Weighted Accumulator

Junyi Yang, Shuai Dong, Zhengnan Fu, Hongyang Shang +1 more

The paper proposes a highly reconfigurable 256x128 in-memory computing array that significantly improves efficiency and performance for analog computing by introducing novel components for ADC, weight…

View →

cs.LGcs.AIcs.DCRecentMay 27, 2026

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Hanjiang Wu, Abhimanyu Rajeshkumar Bambhaniya, Sarbartha Banerjee, Tuhin Khare +8 more

The paper systematically analyzes the benefits and limits of Attention-FFN Disaggregation (AFD) for Mixture-of-Experts (MoE) LLM serving, demonstrating that AFD is crucial for achieving high throughpu…

View →