Chen

50 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×13ML×8Vision×7NLP×7Distributed×7Architecture×4Software Eng.×4Sound×4

Frequent co-authors

Bing Xue2×

Mengjie Zhang2×

Yuchen Li2×

Tao Chen2×

Yu Cheng2×

Jiaqi Yang1×

Research Timeline

2026

X-Stage: An Overlooked Pipeline Stage for Communication-Computation Overlap in DiT Inference

This paper introduces X-Stage, a software-visible post-issue pipeline stage to improve communication efficiency in distributed diffusion transformer (DiT) inference, leading to significant speedups for DeepGEMM MegaMoE and Ulysses sequence-parallel attention.

Libra: Taming Attention Workload Skew in Long-Context LLM Training with Bounded Sequence Pool

Libra is a load balancing approach for long-context LLM training that groups packed sequences into fixed-size pools and reduces attention workload variance, improving end-to-end throughput and straggler-attention speedup.

Gleam: Adaptive Network-Efficient CUDA API Remoting for Cross-Device GPU Sharing over LANs

This paper proposes Gleam, a framework for efficient GPU sharing across local-area CUDA devices, reducing bandwidth overhead, improving API call latency, and ensuring context consistency.

An Exact Counterexample to Carlson's Associated-Prime Depth Conjecture from a Group of Order 128

The paper provides a negative answer to Carlson's question about whether the depth of a finite-group cohomology ring is always realized by the dimension of one of its associated primes.

TRUAV: Distributed Multi-Agent Reinforcement Learning for Trajectory Planning and Routing Enhancement in UAV-Aided IoT-Enabled VANETs

This paper presents TRUAV, a distributed multi-agent reinforcement learning framework for joint UAV trajectory planning and routing enhancement in UAV-aided VANETs, eliminating the need for global state exchange.

Benchmarking Zero-Shot LLM-Generated Parent Selection in Genetic Programming for Symbolic Regression

This paper benchmarks zero-shot synthesis of parent-selection operators across eight large language models and finds that Claude Sonnet~4.6 and Gemini~3.1 Pro perform strongly, with the best operator surpassing automatic baselines.

Expose Your Disguise: Recovering Source Speaker Identity From Voice Conversion

The paper proposes TRIDENT, a framework to restore a source speaker's identity from converted audio using a three-pronged architecture.

ClinFusion: A Vision-Centric Multimodal LLM System for Holistic Medical Understanding

This paper introduces ClinFusion, a vision-centric multimodal large language model designed for holistic medical understanding, featuring a Cascade Spatial-Aware Locality Fusion operator and a vision-grounded evaluation framework.

Data Pyramid for Embodied Manipulation

This paper organizes embodied data sources for multimodal foundation models into a pyramid, focusing on real-robot, UMI-style, egocentric and exocentric, simulation, and general vision-language data.

Beyond Prefill-Decode Disaggregation: Dissecting LLM Inference for Heterogeneous Platforms via Dynamic Operator Scheduling

This paper presents DOPS, a hardware-aware framework for optimizing operator scheduling and weight layouts in Large Language Models, achieving significant speedups over prefill-decode disaggregation.

Covert Semantic Transmission in ISAC: Dual-Functional Waveform Design and Rectified Flow-Assisted Recovery

This paper proposes CoSMIC, a framework for semantic integrated sensing and communication (ISAC) that embeds semantic information into waveforms while maintaining covertness and sensing fidelity.

Desktop-Delta Bench: Do Computer-Use Models Understand Desktop GUI Transitions?

This paper introduces Desktop-Delta Bench (DDB), an offline step-level benchmark for evaluating computer-use agents' ability to reconstruct causal transitions in desktop GUI environments.

RSIBench-Data: Benchmarking Data-Centric Research for Recursive Self-Improvement

This paper introduces RSIBench-Data, a controlled benchmark for evaluating data-centric research capabilities of LLM agents.

CW-Ghost: Search-Free Granularity Selection for Helper-Thread Prefetching via Capacity Windows

This paper introduces CW-Ghost, a method for estimating cache line fill volume and determining helper-thread prefetching granularity based on cache capacity constraints.

Specula: Scaling formal specifications for autonomous model checking of system code

Specula is an autonomous system that generates high-quality formal specifications for large, complex code using LLMs, improving understanding and finding bugs.

Sharpness-aware Model Merging with Salience Recovery for LLM-based Cross-Domain Sequential Recommendation

The paper proposes SharpRec, a framework for LLM-based Cross-Domain Sequential Recommendation to address the bottlenecks of cross-domain knowledge conflict and performance saturation in multi-domain fusion.

The Case Against Generation for Retrieval: Discriminative Language Models as Effective Retrievers

This paper adapts Large Language Models as semantic representation backbones in a two-tower retrieval architecture for high-throughput, large-scale recommendation systems.

The Best of Times, the Worst of Times: Moment-Based Analysis of Probabilistic Cost Structures

This paper presents a compositional cost analysis for probabilistic programs with hierarchical cost structures, allowing computation of mean and higher moments of non-additive costs.

Sequential Preconditioned Conjugate Gradient Method for Linear Statistical Models

This paper proposes a randomized iterative method called Sequential Preconditioned Conjugate Gradient Method (SPCG) for large-scale linear statistical models, which significantly reduces computational cost by solving smaller subproblems.

From Role Prompt to Infinite Thinking: Exploiting Persona Conditioning for Inference Cost Attacks in LLMs

This paper reveals a new vulnerability in Language Model (LLM) inference efficiency caused by persona consistency and proposes RolePlay, a framework to amplify inference costs.

Highlighted terms show continued research focus across papers

Papers

cs.ARNEWEmpiricalJul 28, 2026

Beyond Prefill-Decode Disaggregation: Dissecting LLM Inference for Heterogeneous Platforms via Dynamic Operator Scheduling

Jiaqi Yang, Jiayi Li, Yihan Fu, Hongxiao Zhao +4 more

This paper presents DOPS, a hardware-aware framework for optimizing operator scheduling and weight layouts in Large Language Models, achieving significant speedups over prefill-decode disaggregation.

View →

eess.SPcs.IT