KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing
The paper introduces KBF, a low-cost black-box auditing protocol that fingerprints LLM APIs by analyzing stable numerical recall near the knowledge boundary, successfully detecting numerous model substitutions and inconsistencies across various production endpoints.
Abstract
More Like ThisRelay and reseller APIs increasingly intermediate access to large language models (LLMs), but users have no direct way to verify that a claimed endpoint is actually serving the advertised model. We introduce KBF, a low-cost black-box auditing protocol that fingerprints model APIs using stable numerical recall near the knowledge boundary. Across 16 production LLM endpoints, KBF flags all 155 economically relevant substitutions without rejecting any same-model controls, remains stable under deployment variation, detects high-separation mixed-routing attacks when only 5-10% of traffic is substituted, and finds that 7 of 27 platform model cells in a six-platform shadow API audit are statistically inconsistent with their reference endpoints, with inconsistencies concentrated on premium Claude endpoints.