Souvik Kundu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper systematically analyzes the benefits and limits of Attention-FFN Disaggregation (AFD) for Mixture-of-Experts (MoE) LLM serving, demonstrating that AFD is crucial for achieving high throughput under strict latency constraints.
SPARQLe is a hardware-software co-design framework that exploits the inherent sub-precision sparsity of LLM activations to reduce memory traffic and enable efficient computation on lower-bit datapaths, significantly accelerating inference.
MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.
Papers
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
MOSAIC is a novel scheduling framework that significantly accelerates Mixture-of-Agents (MoA) workloads by jointly optimizing expert placement and utilizing confidence-aware adaptive aggregation.