Papers similar to 2606.03014

~ similar to 2606.03014· 20 results

cs.LGcs.AIRecentMay 31, 2026

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Zhiyao Xu, Aoxue Liu, Zhanjie Ding, Dan Zhao +2 more

The paper proposes Task-Aware Coactivation Grouping (TACG) to significantly reduce communication costs in multi-task MoE inference by grouping experts based on task-specific co-activation patterns, ou…

View →

cs.MAcs.AIRecentMay 29, 2026

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota

The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…

View →

cs.AIRecentMay 31, 2026

Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts

Ya Shen, Gang Chen, Hui Ma, Mengjie Zhang

The paper introduces DEFT, a novel Mixture-of-Experts DRL architecture, to intelligently schedule dynamic cloud workflows with varying deadlines, significantly improving performance over existing sing…

View →

cs.DCcs.AIcs.LGRecentMay 31, 2026

Lodestar: An Online-Learning LLM Inference Router

Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan +2 more

Lodestar is a novel online learning-based request routing system that significantly improves LLM inference efficiency by dynamically assigning incoming requests to the optimal GPU instance to minimize…

View →

cs.MAcs.CLcs.LGRecentJun 1, 2026

Multi-Agent Computer Use

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

The paper proposes Multi-Agent Computer Use (MACU) systems, which significantly improve performance on complex, long-horizon tasks by enabling parallel execution and dynamic task decomposition compare…

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.CLcs.AIcs.MARecentJun 3, 2026

Streaming Communication in Multi-Agent Reasoning

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen +2 more

The paper introduces StreamMA, a streaming multi-agent reasoning system that significantly reduces latency and improves effectiveness by passing reasoning steps to downstream agents as they are genera…

View →

cs.LGcs.AIcs.CLEmpiricalRecentJun 10, 2026

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

View →

cs.AIRecentMay 27, 2026

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

Shreyas Fadnavis, Praitayini Kanakaraj, Felix Wyss

The paper proposes using an LLM aggregator that analyzes complete reasoning traces, demonstrating that trace-level synthesis is superior to traditional consensus methods like majority voting for solvi…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang +3 more

The paper proposes BaSE, a multi-armed bandit approach, to optimally allocate a fixed budget of LLM calls across parallel evolutionary search trajectories, significantly improving mean fitness and rel…

View →

cs.AIRecentMay 28, 2026

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao, Yuan Yuan, Jing Liu

RACE-Sched is an asynchronous agentic framework that successfully integrates low-latency, real-time scheduling decisions with advanced, long-horizon reasoning provided by Large Language Models.

View →

cs.MAcs.AIRecentMay 28, 2026

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Corrado Rainone, Davide Belli, Bence Major, Arash Behboodi

This paper systematically analyzes the complex design space of hybrid multi-agent systems combining on-device and cloud AI models, finding that the optimal architecture is highly task-dependent and th…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.DCcs.ARcs.LGRecentJun 1, 2026

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving

Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang +1 more

The paper proposes scheduling LLM agent workloads at the conversation level rather than the turn level, significantly reducing latency and improving energy efficiency by transforming unpredictable mul…

View →

cs.AIRecentMay 31, 2026

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more

The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…

View →

cs.AIRecentMay 30, 2026

CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems

Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar +1 more

CoMIC is a cloud-edge framework that enables resource-constrained LLM agents to successfully complete complex, long-horizon tasks by collaboratively sharing and refining memory and insights between lo…

View →

cs.CLRecentJun 1, 2026

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim +11 more

The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 problems grounded in Korean contexts, demonstrating that current frontier LLMs struggle significantly with complex, context…

View →

cs.LGcs.AIcs.CLRecentMay 29, 2026

PithTrain: A Compact and Agent-Native MoE Training System

Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy +5 more

The paper introduces PithTrain, a compact, agent-native Mixture-of-Experts (MoE) training framework that significantly improves agent-task efficiency compared to existing production stacks.

View →

cs.AIRecentMay 27, 2026

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

Shuoming Zhang, Qiuchu Yu, Yangyu Zhang, Ruiyuan Xu +5 more

KLineage introduces a novel method to teach LLMs when and how to apply GPU kernel optimizations by reverse-engineering expert kernel lineages, resulting in superior optimization skills compared to exi…

View →

cs.CLRecentMay 29, 2026

dMoE: dLLMs with Learnable Block Experts

Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma +1 more

dMoE proposes a block-level Mixture-of-Experts (MoE) framework for Diffusion Large Language Models (dLLMs) that aggregates token-level expert distributions into a unified block-level distribution, sig…

View →