Computer Architecture
CPU/GPU design, memory systems, and hardware accelerators
20 papers indexed
OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs
OpenEye is a scalable, sparsity-aware FPGA-based hardware accelerator designed to efficiently execute common deep neural network operations, demonstrating favorable performance-resource trade-offs acr…
Triple-Hoisted Baby-Step Giant-Step Linear Transformation over CKKS Homomorphic Encryption and Hardware Accelerator
The paper proposes a novel triple-hoisted baby-step giant-step algorithm and a memory-optimized FPGA accelerator to significantly reduce the ciphertext rotations and off-chip memory access latency whe…
Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid
This paper characterizes the gap between current DNN-based speech enhancement systems and hearing aid constraints, and proposes a lightweight architecture to meet these constraints.
On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection
The paper proposes a privacy-preserving visual monitoring system that performs object detection and generates natural language alerts entirely on an edge device, ensuring GDPR compliance by never tran…
HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces
HASTE introduces group-shared fixed fan-in sparsity for multi-label classification, achieving significant wall-clock speedups (up to 25x in backward pass) by enabling efficient GPU execution while mai…
PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding
The paper introduces PoSME, a cryptographic primitive that enforces strict sequential memory execution by chaining data-dependent writes, providing verifiable delay and authorship attestation.
HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption
Shangyi Shi, Husheng Han, Zhaoxuan Kan, Yinghao Yang +7 more
The paper proposes $HE^2$, a novel communication-light heterogeneous accelerator architecture that significantly improves the efficiency of Fully Homomorphic Encryption (FHE) by optimizing dataflow an…
HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption
Shangyi Shi, Husheng Han, Zhaoxuan Kan, Yinghao Yang +7 more
The paper proposes $HE^2$, a novel communication-light heterogeneous accelerator architecture that significantly improves the efficiency of Fully Homomorphic Encryption (FHE) by optimizing dataflow an…
Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…
Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System
Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more
The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…
ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases
ACRONYM is a novel algorithm-hardware co-designed platform that enables high-recall, continuous approximate nearest neighbor search in memory for dynamic vector databases, achieving massive throughput…
The Usefulness Gap in Proof-of-Useful-Work: An Empirical Study of Pearl's cuPOW Protocol
This empirical study of Pearl's cuPOW protocol demonstrates that the network's Proof-of-Useful-Work mechanism generates zero useful AI computation, instead causing economic harm and displacing legitim…
Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators
Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…
LIPPEN: A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity
LIPPEN introduces a novel hardware-software co-design that provides strong, zero-overhead pointer encryption for enhanced memory safety, achieving comprehensive pointer integrity and confidentiality.
ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more
This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).
Hardware-Accelerated Line-Rate Bitstream Screening for Secure FPGA Reconfiguration
The paper introduces BLADEI, a hardware-accelerated framework that screens FPGA configuration bitstreams for anomalies in real-time, overcoming the latency bottleneck of traditional software-based det…
Securing High-Performance Data Transfers: Implementing AES Encryption in RDMA Systems
Erik Bångsbo, Zakaria Hersi, Anna Benktson, Stefan Holmgren +1 more
This paper proposes and demonstrates a method to secure high-performance RDMA data transfers by implementing AES-128 encryption directly within a programmable network switch, maintaining high throughp…
GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer
Chris S. Lin, Yuqin Yan, Guozhen Ding, Joyce Qu +3 more
This paper demonstrates a novel GPU-side privilege escalation attack, showing that Rowhammer can be used to target and tamper with page tables to gain unauthorized access to co-tenant memory and ultim…