"quasi-stochastic systolic architecture"

20 results for “quasi-stochastic systolic architecture”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.AREmpiricalRecentJun 10, 2026

BenDi: An Energy-Efficient Quasi-Stochastic Systolic Architecture for Edge Bioelectronics

Bochen Ye, Yihan Pan, Shady Agwa, Themis Prodromakis

This paper presents BenDi, an energy-efficient quasi-stochastic systolic architecture for bioelectronic systems on the edge.

View →

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.ARcs.ETRecentJun 4, 2026

FQA: A Full-Space Quantization-Driven Architecture for Hardware-Efficient Piecewise Approximation of Nonlinear Activation Functions

Chenjun Hao, Feng Yan, Hongbing Pan, Yuxuan Wang

This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…

View →

cs.CRcs.FLcs.MSRecentMar 20, 2026

Cellular Automata based Resource Efficient Maximally Equidistributed Pseudo-Random Number Generators

Bhuvaneswari A, Kamalika Bhattacharjee

The paper proposes a novel set of combined cellular automaton (CA)-based pseudo-random number generators (PRNGs) that overcome the weak equidistribution issues of existing CA-based PRNGs, achieving ma…

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →

cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →

cs.ARcs.MSRecentJun 3, 2026

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

Dmitrii Vasiliev

This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Activation Concentration: Characterizing Column-Level Output Sparsity Across Diffusion Model Architectures

Dazhi Yang, Shafayat Mowla Anik, Byeong Kil Lee, Jeeho Ryoo

The paper systematically characterizes column-level activation sparsity across various diffusion model architectures, demonstrating that element-level sparsity metrics significantly overestimate the a…

View →

cs.PLcs.CCcs.FLRecentMay 30, 2026

Grid Programs: A Two-Dimensional, Variable-Free Model of Computation

Ezequiel López-Rubio

The paper introduces Grid Programs, a novel, Turing-complete model of computation where programs are two-dimensional arrangements of instructions, fundamentally departing from linear code structures.

View →

cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.ARcs.AIcs.NERecentJun 4, 2026

ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training

Haihang Xia, Xinyu Zhao, Xuecheng Wang, John Goodenough +4 more

This paper proposes and validates a novel hardware architecture, ITP-STDP, to significantly reduce the energy consumption and hardware overhead associated with training Spiking Neural Networks (SNNs).

View →

cs.AImath.OCRecentJun 1, 2026

Stochastic convergence of parallel asynchronous adaptive first-order methods

Serge Gratton, Philippe L. Toint

The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.

View →

cs.LGcs.AIcs.ARRecentJun 3, 2026

Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication

Yuyang Du, Yujun Huang, Gioele Zardini

This paper presents a unified framework for end-to-end co-design of neural network processors.

View →

cs.ARRecentMay 28, 2026

Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems

Lukas Einhaus, Natalie Maman, Julian Hoever, Andreas Erbslöh +1 more

The paper proposes a novel convolutional block and optimization algorithm to implement resource-efficient 1D-CNNs for atrial fibrillation detection on tiny smart sensor systems, achieving high accurac…

View →

cs.ARRecentMay 29, 2026

A Reconfigurable Computing In-Memory Macro with Charge-sharing-based Weighted Accumulator

Junyi Yang, Shuai Dong, Zhengnan Fu, Hongyang Shang +1 more

The paper proposes a highly reconfigurable 256x128 in-memory computing array that significantly improves efficiency and performance for analog computing by introducing novel components for ADC, weight…

View →

cs.LGcs.AIcs.CCRecentMay 28, 2026

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.

View →

cs.CRRecentMay 17, 2026

Triple-Hoisted Baby-Step Giant-Step Linear Transformation over CKKS Homomorphic Encryption and Hardware Accelerator

Sajjad Akherati, Xinmiao Zhang

The paper proposes a novel triple-hoisted baby-step giant-step algorithm and a memory-optimized FPGA accelerator to significantly reduce the ciphertext rotations and off-chip memory access latency whe…

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.ARcs.DBcs.ETRecentJun 2, 2026

ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases

Md Mizanur Rahaman Nayan, Tianqi Zhang, Flavio Ponzina, Tajana Rosing +1 more

ACRONYM is a novel algorithm-hardware co-designed platform that enables high-recall, continuous approximate nearest neighbor search in memory for dynamic vector databases, achieving massive throughput…

View →