Jiaxi Wen

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1AI×1Software Eng.×1

Frequent co-authors

Yongxi Zhou1×

Lai Yun Choi1×

Wenbo Ye1×

Research Timeline

2026

Accuracy, Stability, and Repeated-Run Reliability of Large Language Models on Deterministic Programming Tasks

The paper demonstrates that standard LLM evaluation metrics overestimate performance because they fail to account for the stability of outcomes, showing a significant gap between reported pass rates and actual retry-free coverage.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.SERecentMay 30, 2026

Accuracy, Stability, and Repeated-Run Reliability of Large Language Models on Deterministic Programming Tasks

Yongxi Zhou, Lai Yun Choi, Jiaxi Wen, Wenbo Ye

View →