~ similar to 2605.29752· 18 results
Hawkeye is a system that allows perfect, precision-preserving reproduction of GPU-level matrix multiplication operations on a CPU, enabling efficient and trustworthy third-party auditing of machine le…
O-POPE is a novel outer-product engine that accelerates floating-point GEMM by repurposing FPU pipeline registers as buffers, achieving high utilization and improved energy efficiency.
The paper proposes PrISM, an intersection-based probabilistic mitigation technique that significantly improves the scalability of RowHammer defense at low thresholds by correlating sampled row history…
The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…
This paper demonstrates that Large Language Models (LLMs) can serve as accurate and selective surrogates for costly GPU kernel performance measurements, significantly expanding the search space for op…
HammerSim is a new gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to study complex OS effects and hardware/software mitigations.
HammerSim is a novel gem5-based framework that provides full-system visibility to model the RowHammer vulnerability, allowing researchers to evaluate complex hardware and software mitigations.
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.
The paper proposes using hardware fingerprints instead of vulnerable cryptographic keys to enhance the security and robustness of GPU location verification for governing advanced AI development.
The paper introduces PortBench, a comprehensive benchmark that evaluates LLMs for portfolio management by assessing both correlation awareness and performance across a full, multi-stage decision pipel…
Shuoming Zhang, Qiuchu Yu, Yangyu Zhang, Ruiyuan Xu +5 more
KLineage introduces a novel method to teach LLMs when and how to apply GPU kernel optimizations by reverse-engineering expert kernel lineages, resulting in superior optimization skills compared to exi…
This paper presents a GPU-accelerated implementation of a Learning with Errors (LWE)-based Key Encapsulation Mechanism (KEM), demonstrating significant speedups and energy efficiency gains on modern G…
The paper introduces BOUNDARY FLOW, an LLVM-based framework that enhances kernel fuzzing and analysis by extracting per-task, state-aware data-flow information (arguments and return values) at functio…
Jumin Kim, Seungmin Baek, Hwayong Nam, Minbok Wi +2 more
The paper introduces PVAC, a novel victim-based row counting mechanism that accurately tracks RowHammer attacks by incrementing counters on the victim row, thereby improving hammering tolerance and pe…
Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…
The paper analyzes the bit-flip vulnerability of shared KV-cache blocks in LLM serving systems, demonstrating that these blocks are susceptible to silent, persistent, and selective data corruption.
The paper proposes Rowhammer Vulnerability Counter (RVC), a novel framework that improves RowHammer mitigation by tracking a row's actual vulnerability to bit flips rather than relying on simple activ…