Tom Biskupski
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
Crypto×1AI×1ML×1
Frequent co-authors
Research Timeline
2026
Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models
This paper evaluates the reliability of using Large Language Models (LLMs) as automated judges to assess the quality of other LLMs, finding a high correlation with human judgment when suitable prompts and powerful models are used.
Highlighted terms show continued research focus across papers