Yilin Chen
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
Architecture×1NLP×1ML×1
Frequent co-authors
Research Timeline
2026
Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significantly reducing both Time-to-First-Token (TTFT) and Time-Per-Output-Token (TPOT).
Highlighted terms show continued research focus across papers
Papers
cs.ARcs.CLcs.LGRecentJun 1, 2026
Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving
Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao +1 more
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significan…
View →