Computer Architecture

CPU/GPU design, memory systems, and hardware accelerators

20 papers indexed

cs.ARcs.CVcs.DCEmpiricalRecentJun 30, 2026

FlexViT: A Flexible FPGA-based Accelerator for Edge Vision Transformers

Hubert Dymarkowski, Xingjian Fu, Rappy Saha, Jude Haris +1 more

This paper presents FlexViT, a reconfigurable FPGA accelerator for efficient Vision Transformer (ViT) inference on edge devices, achieving up to 2.74x speedup on accelerator-executed layers.

View →

cs.ARRecentMay 31, 2026

OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs

Denis Lebold, Hendrik Wöhrle

OpenEye is a scalable, sparsity-aware FPGA-based hardware accelerator designed to efficiently execute common deep neural network operations, demonstrating favorable performance-resource trade-offs acr…

View →

cs.ARcs.AIEmpiricalRecentJul 24, 2026

Sparse by Command: Task-Conditional Compute Skipping for Multi-Task Inference Accelerators

Afzal Ahmad, Gaoyu Mao, Shoubo Hu, Hui-Ling Zhen +3 more

A co-designed hardware-software approach for task-conditional sparsity in multi-task inference models, reducing FLOPs, latency, and energy.

View →

cs.CRRecentMay 17, 2026

Triple-Hoisted Baby-Step Giant-Step Linear Transformation over CKKS Homomorphic Encryption and Hardware Accelerator

Sajjad Akherati, Xinmiao Zhang

The paper proposes a novel triple-hoisted baby-step giant-step algorithm and a memory-optimized FPGA accelerator to significantly reduce the ciphertext rotations and off-chip memory access latency whe…

View →

cs.AREmpiricalRecentJul 27, 2026

A Heterogeneous Neural Network Accelerator for End-to-End Multitask RF Signal Recognition

Zhifan Song, Haralampos-G. Stratigopoulos, Hassan Aboushady

This paper proposes a heterogeneous neural network accelerator for multi-task RF signal recognition, achieving high accuracy and low latency for automatic modulation recognition, hardware-Trojan cover…

View →

cs.DCcs.CREmpiricalRecentJun 26, 2026

RAMSES: Secure high-performance computing for sensitive data

Peter Heger, Lech Nieroda, Roland Pabel, Christoph Stollwerk +6 more

RAMSES is a new HPC system that integrates hardware-based memory encryption and state-of-the-art file encryption to deliver high performance and robust security.

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.CVcs.CRRecentMay 28, 2026

On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection

Gudrun Schappacher-Tilp, Nicoletta Kaehling, Jan Kornberger, Egon Teiniker

The paper proposes a privacy-preserving visual monitoring system that performs object detection and generates natural language alerts entirely on an edge device, ensuring GDPR compliance by never tran…

View →

cs.SDcs.AReess.ASRecentJun 2, 2026

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

Feyisayo Olalere, Umut Altin, Kiki van der Heijden, Marcel van Gerven

This paper characterizes the gap between current DNN-based speech enhancement systems and hearing aid constraints, and proposes a lightweight architecture to meet these constraints.

View →

cs.LGcs.AIRecentMay 31, 2026

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces

Nasib Ullah, Jinbin Zhang, Jean Lucien Randrianantenaina, Erik Schultheis +1 more

HASTE introduces group-shared fixed fan-in sparsity for multi-label classification, achieving significant wall-clock speedups (up to 25x in backward pass) by enabling efficient GPU execution while mai…

View →

cs.AREmpiricalRecentJul 3, 2026

ArchEval: Measuring AI Agents as Computer Architects

Chenyu Wang, Zishen Wan, Jeffrey Ma, Shvetank Prakash +7 more

This paper introduces ArchEval, a benchmark and platform for evaluating LLM agents on computer architecture design and optimization.

View →

cs.ARcs.CRRecentMay 29, 2026

HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption

Shangyi Shi, Husheng Han, Zhaoxuan Kan, Yinghao Yang +7 more

The paper proposes $HE^2$, a novel communication-light heterogeneous accelerator architecture that significantly improves the efficiency of Fully Homomorphic Encryption (FHE) by optimizing dataflow an…

View →

cs.ARcs.CRRecentMay 29, 2026

HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption

Shangyi Shi, Husheng Han, Zhaoxuan Kan, Yinghao Yang +7 more

View →

cs.AIRecentMay 29, 2026

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Hai Lin

The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…

View →

cs.ARcs.ETEmpiricalRecentJul 3, 2026

AIGOR: A Modular, Event-Driven Neuromorphic Architecture for Configurable SNN Inference

Pierpaolo Perticaroli, Roberto Ammendola, Andrea Biagioni, Ottorino Frezza +9 more

A modular, event-driven neuromorphic architecture for spiking neural network inference is presented, allowing for flexible configuration of neuron model, precision, and partitioning.

View →

cs.CRcs.DCRecentApr 17, 2026

PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

David L. Condrey

The paper introduces PoSME, a cryptographic primitive that enforces strict sequential memory execution by chaining data-dependent writes, providing verifiable delay and authorship attestation.

View →

cs.AREmpiricalRecentJul 4, 2026

TileLens: Efficiently Using Large-Granularity Memory Systems with Transparent Two-Dimensional Memory Layout

Jae Hyung Ju, Euijun Chung, Hritvik Taneja, Anish Saxena +3 more

This paper proposes TileLens, a system to mitigate read amplification in Large-Granularity Memory Systems (LGMS) for Large Language Model (LLM) inference by adopting a tile-major layout.

View →

cs.CVcs.AIcs.LGRecentMay 27, 2026

Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study

Sudip Vhaduri, Ryan Gammon, Sayanton Dibbo

This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…

View →

cs.AREmpiricalRecentJul 22, 2026

DGNA: Dissecting GPU NUMA Architecture through Microbenchmarking and Data Analysis

Changxi Liu, Yun Chen, Trevor E. Carlson

This paper introduces DGNA, a methodology to unveil the Non-Uniform Memory Access (NUMA) architecture of GPU memory hierarchy through microbenchmarking and data analysis.

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →