Bin Cui
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
AutoSci is a memory-centric agentic system designed to automate the entire scientific research lifecycle by integrating structured memory, multi-stage execution, and continuous self-improvement.
The paper proposes DARTS, a distribution-aware active rollout trajectory shaping method that fundamentally accelerates LLM reinforcement learning by actively shaping the long-tail response distribution towards conciseness and certainty.
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significantly reducing both Time-to-First-Token (TTFT) and Time-Per-Output-Token (TPOT).
Papers
Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving
Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao +1 more
The paper proposes AsymCache, a computation-latency-aware KV cache management system that optimizes LLM inference by aligning cache eviction decisions with GPU attention kernel performance, significan…