20 results for “systolic architectures”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper presents BenDi, an energy-efficient quasi-stochastic systolic architecture for bioelectronic systems on the edge.
Shruthi Gorantala, Jianming Tong, Asra Ali, Baiyu Li +6 more
The paper introduces AlphaEvolve, an evolutionary search framework that automates the optimization of Fully Homomorphic Encryption (FHE) kernels on TPUs, achieving significant speedups over human-engi…
This paper presents a hardware-oriented description of GoldenFloat, a static-split floating-point family, and its concrete artefacts.
O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.
This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…
Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…
The paper proposes a novel triple-hoisted baby-step giant-step algorithm and a memory-optimized FPGA accelerator to significantly reduce the ciphertext rotations and off-chip memory access latency whe…
Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more
The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…
The paper details significant enhancements to the SONARR system's core logic, replacing restrictive Boolean logic with generic data type support and adding multi-compute capabilities to improve vulner…
HammerSim is a new gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to study complex OS effects and hardware/software mitigations.
HammerSim is a novel gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to evaluate complex hardware and software mitigations.
Jianming Tong, Jingtian Dang, Simon Langowski, Tianhao Huang +5 more
The paper introduces MORPH, a framework that reformulates Zero-Knowledge Proof (ZKP) computations to efficiently utilize AI ASICs like TPUs, achieving up to 10x higher throughput on NTT.
Junyi Yang, Shuai Dong, Zhengnan Fu, Hongyang Shang +1 more
The paper proposes a highly reconfigurable 256x128 in-memory computing array that significantly improves efficiency and performance for analog computing by introducing novel components for ADC, weight…
This paper provides the first systematic, isolated benchmarks of NIST-standardized post-quantum cryptography (ML-KEM and ML-DSA) on the highly constrained ARM Cortex-M0+ processor, showing performance…
ACRONYM is a novel algorithm-hardware co-designed platform that enables high-recall, continuous approximate nearest neighbor search in memory for dynamic vector databases, achieving massive throughput…
The paper introduces Grid Programs, a novel, Turing-complete model of computation where programs are two-dimensional arrangements of instructions, fundamentally departing from linear code structures.
HighTide is an evolving, AI-assisted, open-source benchmark suite for VLSI design, providing a comprehensive and scalable platform for hardware development.
Lukas Einhaus, Natalie Maman, Julian Hoever, Andreas Erbslöh +1 more
The paper proposes a novel convolutional block and optimization algorithm to implement resource-efficient 1D-CNNs for atrial fibrillation detection on tiny smart sensor systems, achieving high accurac…