~ similar to 2606.05741· 20 results
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
The paper characterizes 'dead-entry' TLB misses in GPUs, which occur when recently evicted translations are immediately re-walked, and proposes DEPOT, a Bloom filter mechanism that significantly reduc…
Physical AI inference (batch-1 decode) is primarily memory-bandwidth-bound, but the observed latency gap between fast and slow GPUs is not solely due to memory bandwidth, as launch-side overheads beco…
Space Fabric introduces a novel satellite-based Trusted Execution Architecture (TEE) that establishes trust for orbital computing by generating cryptographic secrets and binding workload execution to…
The paper models how AI-driven data center demand stresses the electrical grid, finding that relying solely on renewable energy certificates (RECs) is insufficient and that on-site storage and spatial…
The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…
The paper introduces performance ruggedness analysis to quantify performance variance in GEMM workloads, proposing a two-stage software stack that significantly smooths the performance landscape and b…
Harshita Gupta, Mayank Kabra, Jaewoo Park, Priyam Mehta +8 more
The paper characterizes Homomorphic Encryption (HE) operations on a real-world Processing-In-Memory (PIM) system, demonstrating that while PIM is a viable alternative to CPUs/GPUs, performance is limi…
This paper proposes and analyzes architectural designs for space-based Public Key Infrastructure (PKI) to enable secure, low-latency authentication and trust services for rapidly expanding satellite c…
This paper systematically analyzes the complex design space of hybrid multi-agent systems combining on-device and cloud AI models, finding that the optimal architecture is highly task-dependent and th…
The paper introduces Glass Box, a runtime constitutional AI verification layer designed to ensure the safety and adherence to physical laws for autonomous AI systems operating in orbital data centers.
This paper investigates the potential of real-world Processing-in-Memory (PIM) architectures, specifically using UPMEM, to accelerate cryptographic algorithms, demonstrating that distributing computat…
The paper introduces Chimera, a highly efficient and scalable MCU designed for ultra-low-power edge AI inference, achieving 3.1 TOPS/W by integrating a dedicated transformer accelerator and a QoS-guar…
Tessera introduces a novel hardware architecture that achieves secure, near-line-rate weight streaming for DNNs on UMA edge accelerators by performing cache-line granularity decryption during DRAM fet…
The paper introduces SPARROW, an autonomous, open-source platform that uses solar power, edge AI, and satellite communication to enable continuous, scalable biodiversity monitoring in remote global ec…
The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…
Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi +2 more
GPIR is a GPU-accelerated Private Information Retrieval (PIR) system that significantly boosts throughput by introducing a stage-aware hybrid execution model and optimizing data layouts for modern GPU…
Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar +1 more
CoMIC is a cloud-edge framework that enables resource-constrained LLM agents to successfully complete complex, long-horizon tasks by collaboratively sharing and refining memory and insights between lo…
Zhe Zhao, Haibin Wen, Yingcheng Wu, Jiaming Ma +9 more
The paper introduces Science Earth, a planet-scale scientific runtime that enables diverse, siloed AI capabilities to connect and collaborate dynamically, demonstrating that scientific discovery can b…
The paper proposes using Trusted-Execution Environments (TEEs) to create a scalable, privacy-preserving system where authors can submit cryptographic proofs of correct research replication, thereby ad…