Gerhard Wellein

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Distributed×1AI×1Networking×1

Frequent co-authors

Bole Ma1×

Jan Eitzinger1×

Harald Köstler1×

Research Timeline

2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU fabrics.

Highlighted terms show continued research focus across papers

Papers

cs.DCcs.AIcs.NIRecentMay 31, 2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

View →