Papers similar to 2605.29752

~ similar to 2605.29752· 18 results

cs.CRcs.ARcs.LGRecentMar 20, 2026

Hawkeye: Reproducing GPU-Level Non-Determinism

Erez Badash, Dan Boneh, Ilan Komargodski, Megha Srivastava

Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…

View →

cs.ARRecentJun 1, 2026

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead

Danilo Cammarata, Angelo Garofalo, Luca Benini

O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.

View →

cs.CRRecentMay 17, 2026

Loaded Dice: Solving the Non-Selection Problem for Scalable Probabilistic RowHammer Defense

Jeonghyun Woo, Junsu Kim, Aamer Jaleel, Prashant J. Nair

The paper proposes PrISM, an intersection-based probabilistic mitigation technique that significantly improves the scalability of RowHammer defense at low thresholds by correlating sampled row history…

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.LGcs.AIRecentMay 29, 2026

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

Zaid Khan, Justin Chih-Yao Chen, Jaemin Cho, Elias Stengel-Eskin +1 more

This paper demonstrates that Large Language Models (LLMs) can serve as accurate and selective surrogates for costly GPU kernel performance measurements, significantly expanding the search space for op…

View →

cs.CRcs.ARRecentMay 27, 2026

HammerSim: A System-Level Tool to Model RowHammer

Kaustav Goswami, Ayaz Akram, Hari Venugopalan, Jason Lowe-Power

HammerSim is a new gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to study complex OS effects and hardware/software mitigations.

View →

cs.CRcs.ARRecentMay 27, 2026

HammerSim: A System-Level Tool to Model RowHammer

Kaustav Goswami, Ayaz Akram, Hari Venugopalan, Jason Lowe-Power

HammerSim is a novel gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to evaluate complex hardware and software mitigations.

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.LGcs.ARRecentJun 2, 2026

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Saptarshi Mitra, Yifan Zhang, Rachid Karami, Phyo Pyae Moe Aung +4 more

MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.

View →

cs.CRRecentMay 3, 2026

GPU Fingerprinting for Location Verification

Wayne Tee, Jonathan Happel

The paper proposes using hardware fingerprints instead of vulnerable cryptographic keys to enhance the security and robustness of GPU location verification for governing advanced AI development.

View →

cs.AIq-fin.PMRecentMay 27, 2026

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Yuxuan Zhao, Sijia Chen, Ningxin Su

The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…

View →

cs.AIRecentMay 27, 2026

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

Shuoming Zhang, Qiuchu Yu, Yangyu Zhang, Ruiyuan Xu +5 more

KLineage introduces a novel method to teach LLMs when and how to apply GPU kernel optimizations by reverse-engineering expert kernel lineages, resulting in superior optimization skills compared to exi…

View →

cs.CRcs.DCRecentMay 31, 2026

GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

Tiziana Liberati, Nitin Shukla, Matteo Barbieri, Gabriella Bettonte +4 more

This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…

View →

cs.CRcs.OSRecentMay 30, 2026

Beyond Edge Coverage: Per-Task Data-Flow Extraction at Kernel Function Boundaries via LLVM

Yunseong Kim

The paper introduces BOUNDARY FLOW, an LLVM-based framework that enhances kernel fuzzing and analysis by extracting per-task, state-aware data-flow information (arguments and return values) at functio…

View →

cs.CRcs.ARRecentApr 22, 2026

PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting

Jumin Kim, Seungmin Baek, Hwayong Nam, Minbok Wi +2 more

The paper introduces PVAC, a novel victim-based row counting mechanism that accurately tracks RowHammer attacks by incrementing counters on the victim row, thereby improving hammering tolerance and pe…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →

cs.CRcs.ARcs.LGRecentApr 19, 2026

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Yuji Yamamoto, Satoshi Matsuura

The paper analyzes the bit-flip vulnerability of shared KV-cache blocks in LLM serving systems, demonstrating that these blocks are susceptible to silent, persistent, and selective data corruption.

View →

cs.CRcs.ARRecentApr 27, 2026

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Lavi Jain, Venkata Kalyan Tavva

The paper proposes Rowhammer Vulnerability Counter (RVC), a novel framework that improves RowHammer mitigation by tracking a row's actual vulnerability to bit flips rather than relying on simple activ…

View →