ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “CPU design”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.ARcs.LGEmpiricalRecentJun 11, 2026

BigPower: Hierarchical Source-Level Module Power Estimation for CPUs with Large Language Models

Honghua Zhu, Chunjie Luo, Jianfeng Zhan

This paper introduces BigPower, a hierarchical source-level surrogate model for fine-grained module-level power estimation during CPU design using large language models and architectural hierarchy.

View →
cs.AIRecentMay 29, 2026

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Hai Lin

The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…

View →
cs.SEcs.AIcs.LGRecentMay 28, 2026

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

Chaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin, Aiden Grossman +3 more

AI-PROPELLER introduces a novel interprocedural code layout optimization system that uses an agentic evolutionary workflow to achieve significant, measurable performance gains in large-scale, real-wor…

View →
cs.ARcs.AIcs.SERecentJun 2, 2026

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

Benjamin Goldblatt, Paolo Pedroso, Farhad Modaresi, Ethan Sifferman +1 more

HighTide is an evolving, AI-assisted, open-source benchmark suite for VLSI design, providing a comprehensive and scalable platform for hardware development.

View →
cs.LGcs.AIcs.ARRecentJun 3, 2026

Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication

Yuyang Du, Yujun Huang, Gioele Zardini

This paper presents a unified framework for end-to-end co-design of neural network processors.

View →
cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →
cs.ARcs.ETRecentJun 4, 2026

Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space Platforms

Sohan Salahuddin Mugdho, Md. Shahedul Hasan, Cheng Wang

This paper investigates the thermal constraints of deploying AI compute infrastructure in space, comparing GPUs and compute-in-memory (CIM) accelerators using a co-design methodology.

View →
cs.ARcs.AIRecentMay 30, 2026

LP5X-PIM Sim: A High-Fidelity HW/SW Integrated Simulator for LPDDR5X-PIM

SangHoon Cha, Jaewan Choi, Byeongho Kim, Yoonah Paik +2 more

This paper introduces a high-fidelity, integrated hardware-software simulator for LPDDR5X-PIM, enabling precise evaluation of system performance and energy efficiency.

View →
cs.PLcs.CCcs.FLRecentMay 30, 2026

Grid Programs: A Two-Dimensional, Variable-Free Model of Computation

Ezequiel López-Rubio

The paper introduces Grid Programs, a novel, Turing-complete model of computation where programs are two-dimensional arrangements of instructions, fundamentally departing from linear code structures.

View →
cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →
cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →
cs.CRcs.ARRecentMar 28, 2026

Attacking AI Accelerators by Leveraging Arithmetic Properties of Addition

Masoud Heidary, Biresh Kumar Joardar

The paper introduces a novel hardware aging attack that exploits the commutative properties of addition to induce unbalanced stress on AI accelerator transistors, significantly degrading model accurac…

View →
cs.CRcs.AIRecentMar 20, 2026

Meeting in the Middle: A Co-Design Paradigm for FHE and AI Inference

Bernardo Magri, Benjamin Marsh, Paul Gebheim

The paper proposes a co-design paradigm, 'Meeting in the Middle,' to make Fully Homomorphic Encryption (FHE) practical for AI inference by optimizing both the cryptographic schemes and the underlying…

View →
cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →
cs.ARRecentJun 1, 2026

CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

Lorenzo Leone, Philip Wiese, Gamze İslamoğlu, Michael Rogenmoser +3 more

The paper introduces Chimera, a highly efficient and scalable MCU designed for ultra-low-power edge AI inference, achieving 3.1 TOPS/W by integrating a dedicated transformer accelerator and a QoS-guar…

View →
cs.CRRecentApr 13, 2026

Hardware-Efficient Compound IC Protection with Lightweight Cryptography

Levent Aksoy, Muhammad Sohaib Munir, Sedat Akleylek

The paper proposes a hardware-efficient compound IC protection mechanism that combines lightweight cryptography with logic locking and hardware obfuscation to secure integrated circuits against variou…

View →
cs.CRRecentMay 14, 2026

Adapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUs

Shruthi Gorantala, Jianming Tong, Asra Ali, Baiyu Li +6 more

The paper introduces AlphaEvolve, an evolutionary search framework that automates the optimization of Fully Homomorphic Encryption (FHE) kernels on TPUs, achieving significant speedups over human-engi…

View →
cs.ARcs.CLcs.LGRecentJun 1, 2026

Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving

Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao +1 more

The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significan…

View →
cs.CRRecentMar 18, 2026

SoK: From Silicon to Netlist and Beyond $-$ Two Decades of Hardware Reverse Engineering Research

Zehra Karadağ, Simon Klix, René Walendy, Felix Hahn +4 more

This paper systematizes two decades of hardware reverse engineering research by analyzing 187 publications, identifying key technical methods and recommending improvements for reproducibility, standar…

View →
cs.CRcs.ARRecentMay 5, 2026

LIPPEN: A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity

Erfan Iravani, Lalit Prasad Peri, Mohannad Ismail, Charitha Tumkur Siddalingaradhya +3 more

LIPPEN introduces a novel hardware-software co-design that provides strong, zero-overhead pointer encryption for enhanced memory safety, achieving comprehensive pointer integrity and confidentiality.

View →