Yilei Chen
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces CREBench, a comprehensive benchmark for evaluating Large Language Models (LLMs) on cryptographic binary reverse engineering, finding that while LLMs show promise, human experts still maintain a significant advantage.
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significantly reducing both Time-to-First-Token (TTFT) and Time-Per-Output-Token (TPOT).
Papers
Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving
Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao +1 more
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significan…