~ similar to 2606.11613· 19 results
This paper predicts the aggregate crowd salience of a document from its text before its marks accumulate.
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…
The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…
The paper introduces SmartIterator (SI), a visual analytics framework that systematically guides analysts through the complex process of evaluating and understanding how data groupings change across p…
The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…
The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
The paper introduces the Decan metric, a novel, information-theoretic approach for measuring creative diversity in AI outputs, which successfully detects diversity loss across different model fine-tun…
Shihao Rao, Liang Li, Jiapeng Liu, Tong Lin +5 more
The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target loc…
Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more
The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…
The paper introduces TELL, a novel explainable AI-generated text detection architecture that provides detailed, human-understandable explanations for its scores, achieving competitive performance whil…
The paper demonstrates 'argument collapse,' showing that LLMs tend to converge on a small, repetitive set of polished arguments when generating long-form public debates, significantly reducing the div…
The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…
The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.
The paper demonstrates that LLMs generate correlated, non-existent character ensembles (ghost couples) whose co-occurrence rates are highly predictable and model-specific, leading to the creation of f…
Tianjiao Li, Kai Zhao, Xiang Li, Yang Liu +1 more
The paper introduces CASTER, a new human-centric task for evaluating User-Generated Content (UGC) resonance, and proposes MEDEA, an architecture that uses a Social Chain-of-Thought mechanism to simula…
Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more
The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…
The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…
This study demonstrates that instruction-tuned language model agents exhibit robust, group-contingent in-group bias, structurally mimicking human social biases, even when standard action logs fail to…