"systolic architectures" | ArxivCSExplorer

20 results for “systolic architectures”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.AREmpiricalRecentJun 10, 2026

BenDi: An Energy-Efficient Quasi-Stochastic Systolic Architecture for Edge Bioelectronics

Bochen Ye, Yihan Pan, Shady Agwa, Themis Prodromakis

This paper presents BenDi, an energy-efficient quasi-stochastic systolic architecture for bioelectronic systems on the edge.

View →

cs.CRRecentMay 14, 2026

Adapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUs

Shruthi Gorantala, Jianming Tong, Asra Ali, Baiyu Li +6 more

The paper introduces AlphaEvolve, an evolutionary search framework that automates the optimization of Fully Homomorphic Encryption (FHE) kernels on TPUs, achieving significant speedups over human-engi…

View →

cs.ARcs.MSRecentJun 3, 2026

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

Dmitrii Vasiliev

This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.

View →

cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →

cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.CRRecentMay 17, 2026

Triple-Hoisted Baby-Step Giant-Step Linear Transformation over CKKS Homomorphic Encryption and Hardware Accelerator

Sajjad Akherati, Xinmiao Zhang

The paper proposes a novel triple-hoisted baby-step giant-step algorithm and a memory-optimized FPGA accelerator to significantly reduce the ciphertext rotations and off-chip memory access latency whe…

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.ARRecentMay 28, 2026

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh, Gregor Schiele

The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…

View →

cs.CRRecentApr 25, 2026

Core Logic and Algorithmic Performance Enhancements for a System Vulnerability Analysis Technique for Complex Mission Critical Systems Implementation

Matthew Tassava, Cameron Kolodjski, Jordan Milbrath, Jeremy Straub

The paper details significant enhancements to the SONARR system's core logic, replacing restrictive Boolean logic with generic data type support and adding multi-compute capabilities to improve vulner…

View →

cs.CRcs.ARRecentMay 27, 2026

HammerSim: A System-Level Tool to Model RowHammer

Kaustav Goswami, Ayaz Akram, Hari Venugopalan, Jason Lowe-Power

HammerSim is a new gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to study complex OS effects and hardware/software mitigations.

View →

cs.CRcs.ARRecentMay 27, 2026

HammerSim: A System-Level Tool to Model RowHammer

Kaustav Goswami, Ayaz Akram, Hari Venugopalan, Jason Lowe-Power

HammerSim is a novel gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to evaluate complex hardware and software mitigations.

View →

cs.ARcs.CLcs.CRRecentApr 20, 2026

Enabling AI ASICs for Zero Knowledge Proof

Jianming Tong, Jingtian Dang, Simon Langowski, Tianhao Huang +5 more

The paper introduces MORPH, a framework that reformulates Zero-Knowledge Proof (ZKP) computations to efficiently utilize AI ASICs like TPUs, achieving up to 10x higher throughput on NTT.

View →

cs.ARRecentMay 29, 2026

A Reconfigurable Computing In-Memory Macro with Charge-sharing-based Weighted Accumulator

Junyi Yang, Shuai Dong, Zhengnan Fu, Hongyang Shang +1 more

The paper proposes a highly reconfigurable 256x128 in-memory computing array that significantly improves efficiency and performance for analog computing by introducing novel components for ADC, weight…

View →

cs.CRcs.ARcs.PFRecentMar 19, 2026

Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

Rojin Chhetri

This paper provides the first systematic, isolated benchmarks of NIST-standardized post-quantum cryptography (ML-KEM and ML-DSA) on the highly constrained ARM Cortex-M0+ processor, showing performance…

View →

cs.ARcs.DBcs.ETRecentJun 2, 2026

ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases

Md Mizanur Rahaman Nayan, Tianqi Zhang, Flavio Ponzina, Tajana Rosing +1 more

ACRONYM is a novel algorithm-hardware co-designed platform that enables high-recall, continuous approximate nearest neighbor search in memory for dynamic vector databases, achieving massive throughput…

View →

cs.PLcs.CCcs.FLRecentMay 30, 2026

Grid Programs: A Two-Dimensional, Variable-Free Model of Computation

Ezequiel López-Rubio

The paper introduces Grid Programs, a novel, Turing-complete model of computation where programs are two-dimensional arrangements of instructions, fundamentally departing from linear code structures.

View →

cs.ARcs.AIcs.SERecentJun 2, 2026

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

Benjamin Goldblatt, Paolo Pedroso, Farhad Modaresi, Ethan Sifferman +1 more

HighTide is an evolving, AI-assisted, open-source benchmark suite for VLSI design, providing a comprehensive and scalable platform for hardware development.

View →

cs.ARRecentMay 28, 2026

Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems

Lukas Einhaus, Natalie Maman, Julian Hoever, Andreas Erbslöh +1 more

The paper proposes a novel convolutional block and optimization algorithm to implement resource-efficient 1D-CNNs for atrial fibrillation detection on tiny smart sensor systems, achieving high accurac…

View →