Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Gerhard Wellein

Gerhard Wellein

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

Distributed×1AI×1Networking×1

Frequent co-authors

Bole Ma1×
Jan Eitzinger1×
Harald Köstler1×

Research Timeline

2026
Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU fabrics.

Highlighted terms show continued research focus across papers

Papers

cs.DCcs.AIcs.NIRecentMay 31, 2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU f…

View →