Souvik Kundu

3 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×2Architecture×2AI×1Distributed×1

Frequent co-authors

Yifan Zhang1×

Research Timeline

2026

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

The paper systematically analyzes the benefits and limits of Attention-FFN Disaggregation (AFD) for Mixture-of-Experts (MoE) LLM serving, demonstrating that AFD is crucial for achieving high throughput under strict latency constraints.

SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference

SPARQLe is a hardware-software co-design framework that exploits the inherent sub-precision sparsity of LLM activations to reduce memory traffic and enable efficient computation on lower-bit datapaths, significantly accelerating inference.

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.ARRecentJun 2, 2026

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Saptarshi Mitra, Yifan Zhang, Rachid Karami, Phyo Pyae Moe Aung +4 more

MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.

View →

cs.ARRecentMay 29, 2026