Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Bole Ma

Bole Ma

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

Distributed×2AI×2Networking×1ML×1

Frequent co-authors

Jan Eitzinger2×
Harald Köstler1×
Gerhard Wellein1×
Harald Koestler1×

Research Timeline

2026
Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU fabrics.

Leyline: KV Cache Directives for Agentic Inference

Leyline introduces a novel serving-side primitive that allows agentic LLMs to perform targeted, efficient edits to the KV cache, avoiding costly full re-prefilling after content modification.

Highlighted terms show continued research focus across papers

Papers

cs.DCcs.AIcs.NIRecentMay 31, 2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU f…

View →
cs.DCcs.AIcs.LGRecentMay 31, 2026

Leyline: KV Cache Directives for Agentic Inference

Bole Ma, Jan Eitzinger, Harald Koestler

Leyline introduces a novel serving-side primitive that allows agentic LLMs to perform targeted, efficient edits to the KV cache, avoiding costly full re-prefilling after content modification.

View →