Papers similar to 2606.11613

~ similar to 2606.11613· 19 results

cs.IRcs.CLcs.HCEmpiricalRecentJun 10, 2026

The Long Tail, Not the Front Page: Cold-Start Prediction of Crowd Highlight Salience

Kazuki Nakayashiki, Keisuke Watanabe

This paper predicts the aggregate crowd salience of a document from its text before its marks accumulate.

View →

cs.CLcs.AIRecentJun 1, 2026

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Maria Kunilovskaya, Gagan Bhatia, Lisa Sophie Albertelli, Yanran Chen +9 more

This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

Catyana Heyne, Jürgen Frikel, Filippo Riccio

The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…

View →

cs.HCcs.AIcs.LGRecentMay 27, 2026

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

Gennady Andrienko, Natalia Andrienko

The paper introduces SmartIterator (SI), a visual analytics framework that systematically guides analysts through the complex process of evaluating and understanding how data groupings change across p…

View →

cs.AIRecentMay 28, 2026

Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit

Will Jack, Noah Lehman, Keller Maloney, Sarah Xu

The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…

View →

cs.IRcs.AIcs.CYRecentMay 27, 2026

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Annabella Sánchez-Guzmán, Lukas Eberhard, Denis Helic, Lisette Espín-Noboa

The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…

View →

cs.CLcs.CRRecentMay 9, 2026

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

Jialing Gan, Junhao Dong, Songze Li

The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise

Matthew Khoriaty, David Williams-King, Shi Feng

The paper introduces the Decan metric, a novel, information-theoretic approach for measuring creative diversity in AI outputs, which successfully detects diversity loss across different model fine-tun…

View →

cs.CLRecentJun 1, 2026

What to Format and How: A Benchmark and Workflow Approach for Document Formatting

Shihao Rao, Liang Li, Jiapeng Liu, Tong Lin +5 more

The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target loc…

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.AIcs.CLcs.CYRecentMay 27, 2026

Show, Don't TELL: Explainable AI-Generated Text Detection

Aldan Creo, Suraj Ranganath

The paper introduces TELL, a novel explainable AI-generated text detection architecture that provides detailed, human-understandable explanations for its scores, achieving competitive performance whil…

View →

cs.CLcs.AIRecentJun 1, 2026

Argument Collapse: LLMs Flatten Long-Form Public Debate

Yekyung Kim, Yapei Chang, Chau Minh Pham, Mohit Iyyer

The paper demonstrates 'argument collapse,' showing that LLMs tend to converge on a small, repetitive set of polished arguments when generating long-form public debates, significantly reducing the div…

View →

cs.DCcs.AIcs.CLRecentJun 1, 2026

Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense

Nataraj Agaram Sundar, Tejas Morabia

The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…

View →

cs.CLcs.AIcs.LGRecentJun 4, 2026

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao +8 more

The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.

View →

cs.DLcs.LGRecentJun 1, 2026

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

Michał Brzozowski, Neo Christopher Chung

The paper demonstrates that LLMs generate correlated, non-existent character ensembles (ghost couples) whose co-occurrence rates are highly predictable and model-specific, leading to the creation of f…

View →

cs.AIRecentJun 1, 2026

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

Tianjiao Li, Kai Zhao, Xiang Li, Yang Liu +1 more

The paper introduces CASTER, a new human-centric task for evaluating User-Generated Content (UGC) resonance, and proposes MEDEA, an architecture that uses a Social Chain-of-Thought mechanism to simula…

View →

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →

cs.CLRecentMay 31, 2026

Before and After Temperature: A Distributional View of Creative LLM Generation

V. S. Raghu Parupudi, Harsha Ponnada, Aditi Kaushal, S. Shria Parupudi +2 more

The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…

View →

cs.AIRecentMay 27, 2026

Human-like in-group bias in instruction-tuned language model agents

Messi H. J. Lee

This study demonstrates that instruction-tuned language model agents exhibit robust, group-contingent in-group bias, structurally mimicking human social biases, even when standard action logs fail to…

View →