Papers similar to 2606.05741

~ similar to 2606.05741· 20 results

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.ARcs.PFRecentMay 30, 2026

Regular-Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

Shafayat Mowla Anik, Yongchan Jung, Jeeho Ryoo, Byeong Kil Lee

The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…

View →

cs.ARcs.AIcs.DCRecentMay 28, 2026

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Josef Chen

Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…

View →

cs.CRRecentMar 24, 2026

Space Fabric: A Satellite-Enhanced Trusted Execution Architecture

Filip Rezabek, Dahlia Malkhi, Amir Yahalom

Space Fabric introduces a novel satellite-based Trusted Execution Architecture (TEE) that establishes trust for orbital computing by generating cryptographic secrets and binding workload execution to…

View →

econ.EMcs.AIRecentMay 30, 2026

Certificates without Electrons? Theory and Evidence on Impacts from AI-Driven Power Demand

Dana Golden, Aruna Balasubramanian, Niranjan Balasubramanian

The paper models how AI-driven data center demand stresses the electrical grid, finding that relying solely on renewable energy certificates (RECs) is insufficient and that on-site storage and spatial…

View →

cs.AIRecentMay 29, 2026

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Hai Lin

The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…

View →

cs.PFcs.ARcs.DCRecentMay 28, 2026

From Roofline to Ruggedness: Decomposing and Smoothing the GEMM Performance Landscape

Aditya Chatterjee

The paper introduces performance ruggedness analysis to quantify performance variance in GEMM workloads, proposing a two-stage software stack that significantly smooths the performance landscape and b…

View →

cs.CRRecentMay 13, 2026

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more

The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…

View →

cs.CRcs.ETRecentMay 7, 2026

Toward Space-Based Public Key Systems: Enabling Secure Space Communications through In-Orbit Trust Services

Rehana Yasmin, Paulo Esteves-Verissimo, Ali Shoker

This paper proposes and analyzes architectural designs for space-based Public Key Infrastructure (PKI) to enable secure, low-latency authentication and trust services for rapidly expanding satellite c…

View →

cs.MAcs.AIRecentMay 28, 2026

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Corrado Rainone, Davide Belli, Bence Major, Arash Behboodi

This paper systematically analyzes the complex design space of hybrid multi-agent systems combining on-device and cloud AI models, finding that the optimal architecture is highly task-dependent and th…

View →

cs.ETcs.AIcs.ARRecentJun 2, 2026

Glass Box at Orbit: A Constitutional AI Verification Framework for Trustworthy Autonomous CubeSat Intelligence

Karthik Barma, Anil Sanneboyina, V C Premchand Yadav

The paper introduces Glass Box, a runtime constitutional AI verification layer designed to ensure the safety and adherence to physical laws for autonomous AI systems operating in orbital data centers.

View →

cs.CRcs.ARcs.DCRecentMay 19, 2026

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo, Brahmaiah Gandham, Mohammad Sadrosadati, Roberto Passerone +2 more

This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…

View →

cs.ARRecentJun 1, 2026

CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

Lorenzo Leone, Philip Wiese, Gamze İslamoğlu, Michael Rogenmoser +3 more

The paper introduces Chimera, a highly efficient and scalable MCU designed for ultra-low-power edge AI inference, achieving 3.1 TOPS/W by integrating a dedicated transformer accelerator and a QoS-guar…

View →

cs.CRcs.ARcs.LGRecentApr 25, 2026

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar

Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…

View →

eess.SPcs.AIRecentMay 27, 2026

Project SPARROW and the Future of Conservation Technology

Juan M. Lavista Ferres, Carl Chalmers, Bruno Demuro Segundo, Zhongqi Miao +13 more

The paper introduces SPARROW, an autonomous, open-source platform that uses solar power, edge AI, and satellite communication to enable continuous, scalable biodiversity monitoring in remote global ec…

View →

cs.ARRecentMay 28, 2026

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh, Gregor Schiele

The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…

View →

cs.CRcs.ARRecentApr 6, 2026

GPIR: Enabling Practical Private Information Retrieval with GPUs

Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi +2 more

GPIR is a GPU-accelerated Private Information Retrieval (PIR) system that significantly boosts throughput by introducing a stage-aware hybrid execution model and optimizing data layouts for modern GPU…

View →

cs.AIRecentMay 30, 2026

CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems

Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar +1 more

CoMIC is a cloud-edge framework that enables resource-constrained LLM agents to successfully complete complex, long-horizon tasks by collaboratively sharing and refining memory and insights between lo…

View →

cs.AIRecentMay 31, 2026

Science Earth: Towards A Planet-Scale Operating System for AI-Native Scientific Discovery

Zhe Zhao, Haibin Wen, Yingcheng Wu, Jiaming Ma +9 more

The paper introduces Science Earth, a planet-scale scientific runtime that enables diverse, siloed AI capabilities to connect and collaborate dynamically, demonstrating that scientific discovery can b…

View →

cs.CRRecentMar 25, 2026

Trusted-Execution Environment (TEE) for Solving the Replication Crisis in Academia

Jiasun Li, Project Team

The paper proposes using Trusted-Execution Environments (TEEs) to create a scalable, privacy-preserving system where authors can submit cryptographic proofs of correct research replication, thereby ad…

View →