Lai Yun Choi
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1Software Eng.×1
Frequent co-authors
Research Timeline
2026
Accuracy, Stability, and Repeated-Run Reliability of Large Language Models on Deterministic Programming Tasks
The paper demonstrates that standard LLM evaluation metrics overestimate performance because they fail to account for the stability of outcomes, showing a significant gap between reported pass rates and actual retry-free coverage.
Highlighted terms show continued research focus across papers