Dongwon Lee
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper argues that current LLM benchmark datasets are often contaminated by being included in pretraining data, and proposes that future benchmarks must be contamination-resistant and support inference to maintain reliable model evaluation.
The study found that human judgment of logical fallacies is significantly biased by source labels (e.g., human vs. AI), while LLM evaluations remained comparatively stable across these source conditions.
Papers
Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs
The study found that human judgment of logical fallacies is significantly biased by source labels (e.g., human vs. AI), while LLM evaluations remained comparatively stable across these source conditio…