Yiqun Liu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces PassNet, a large-scale ecosystem for generating compiler passes using LLMs, demonstrating that LLMs can significantly accelerate graph compilation for long-tail workloads, suggesting that consistency is the primary bottleneck.
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in current LLM judging methods.
Papers
Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation
Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…