Measuring Form and Function in Language Models
The paper introduces a new quantitative metric, Contextual Alternative Choice (CAC), to rigorously test language models' syntactic and functional understanding of determiners, showing that current models struggle to match human children's performance.
Abstract
More Like ThisWe introduce quantitative metrics for child language acquisition to evaluate language models. Our focus is on the formal syntactic and functional discourse properties of determiners in English, which young children acquire early and accurately. We propose Contextual Alternative Choice (CAC), a new prompting method which provides targeted tests for both syntactic and discourse knowledge of language. The method enables direct comparison of language models against children, and more importantly, against statistical benchmarks independently established in empirical research. No current model trained on a comparable amount of data simultaneously meet both formal and functional benchmarks like human children, but some very large models do. We present our results as methodological and technical contributions, with specific emphasis on cognitive status of language models.