This paper refines word-based grammatical error annotation for L2 Korean by adapting existing resources to better reflect Korean morphology and error types, improving the evaluation of Korean Grammatical Error Correction (K-GEC) systems.
Korean grammatical error correction (K-GEC) presents a structural mismatch between word-based evaluation and the morpheme-level locus of many learner errors. Postpositions and verbal endings are bound to lexical hosts, but they encode grammatical relations that must be represented in correction and evaluation. This paper refines word-based grammatical error annotation for L2 Korean by addressing three connected problems in existing resources: surface target realization, Korean-specific edit annotation, and single-reference evaluation. We reconstruct target sentences from the National Institute of Korean Language (NIKL) L2 corpus under morphologically constrained realization rules and convert its morpheme-level annotations into word-level \texttt{m2} edits. We then define a Korean ERRANT-style annotation scheme that preserves the MRU core while distinguishing functional morpheme errors, spelling errors, word boundary errors, and word order errors. We also augment the KoLLA corpus with an additional reference correction, yielding a multi-reference evaluation setting for Korean GEC. Empirical validation shows that the refined NIKL targets yield lower perplexity, the converted \texttt{m2} files achieve higher agreement with source-target edit representations, and the refined resources improve KoBART-based correction under the same model setting. Multi-reference KoLLA evaluation further reduces the penalty imposed on valid corrections that diverge from a single reference, especially for neural and prompted GEC systems. These results show that Korean GEC evaluation depends not only on correction models, but also on reference data and edit annotations that reflect Korean morphology, spacing, and correction variability.
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 pro…
TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech
The paper introduces TalkTag, an LLM-based tool that automates fine-grained morp…
KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating Speech…
The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudio…
Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit
The paper audits six LLMs across four languages, finding that their gender stere…
Generating and Refining Dynamic Evaluation Rubrics for LLM-as-a-Judge
The paper introduces a novel, training-free method to automatically generate fin…
Learning the Error Patterns of Language Models
The paper introduces prefix filters and an algorithm (Palla) to systematically l…
CB-SLICE: Concept-Based Interpretable Error Slice Discovery
CB-SLICE is a novel concept-based method for discovering model error slices that…
French parsing enhanced with a word clustering method based on a syntactic lexicon
The paper enhances French parsing accuracy by integrating data from a syntactic…