Refining Word-Based Grammatical Error Annotation for L2 Korean

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 pro…

TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech

The paper introduces TalkTag, an LLM-based tool that automates fine-grained morp…

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating Speech…

The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudio…

Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit

The paper audits six LLMs across four languages, finding that their gender stere…

Generating and Refining Dynamic Evaluation Rubrics for LLM-as-a-Judge

The paper introduces a novel, training-free method to automatically generate fin…

Learning the Error Patterns of Language Models

The paper introduces prefix filters and an algorithm (Palla) to systematically l…

CB-SLICE: Concept-Based Interpretable Error Slice Discovery

CB-SLICE is a novel concept-based method for discovering model error slices that…

French parsing enhanced with a word clustering method based on a syntactic lexicon

The paper enhances French parsing accuracy by integrating data from a syntactic…