Tong Yang

5 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×5ML×2NLP×1Optimization and Control×1Stats ML×1

Frequent co-authors

Yaoming Li3×

Guangxiang Zhao2×

Lin Sun2×

Xiangzheng Zhang2×

Wenhan Yu2×

Zhewen Tan2×

Research Timeline

2026

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.

ESPO: Early-Stopping Proximal Policy Optimization

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

ConMoE proposes a train-free method for compressing Mixture-of-Experts (MoE) models by consolidating the large expert pool into a smaller set of reusable prototypes and deterministically remapping all original expert calls to these prototypes.

Agentic Transformers Provably Learn to Search via Reinforcement Learning

This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment.

A Primer in Post-Training Reasoning Data: What We Know About How It Works

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction, and scalability.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

View →

cs.LGcs.AImath.OCRecentMay 29, 2026