Papers similar to 2606.02004

~ similar to 2606.02004· 19 results

cs.IRcs.CLRecentJun 3, 2026

BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration

Yung-Yu Shih, Shang-Yu Su, Tzu-I Ho, Dongzhe Wang +1 more

The paper presents BEATS, a human-in-the-loop LLM framework for bootstrapping product attribute taxonomies from scratch.

View →

cs.CLRecentMay 28, 2026

Auditing LLM Benchmarks with Item Response Theory

Sander Land, Daniel M. Bikel

The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…

View →

cs.LGcs.AIRecentMay 29, 2026

ChurnNet: A Optimized Modern AI for Churn Prediction

Syed Saad Saif, Giulio Maggiore, Paolo Russo, Damiano Distante

This paper compares traditional machine learning models (Random Forests, XGBoost, SVM) against a complex Unified Multi-Task Time Series Model for churn prediction, concluding that conventional methods…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks, suggesting that their apparent skill may be due to…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

View →

cs.CRcs.SERecentMar 31, 2026

When Labels Are Scarce: A Systematic Mapping of Label-Efficient Code Vulnerability Detection

Noor Khalal, Chakib Fettal, Lazhar Labiod, Mohamed Nadif

This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.

View →

cs.CLRecentMay 29, 2026

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank +1 more

This paper proposes a unified evaluation framework for hate speech detection that systematically assesses model performance and explainability across various label and rationale representation spaces,…

View →

cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →

cs.SEcs.CRRecentApr 1, 2026

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar, Eray Tüzün +1 more

SERSEM introduces a selective entropy-weighted scoring framework to significantly improve Membership Inference Attacks (MIAs) against code LLMs by focusing on human-centric coding anomalies rather tha…

View →

cs.AIRecentMay 28, 2026

Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit

Will Jack, Noah Lehman, Keller Maloney, Sarah Xu

The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…

View →

cs.IRcs.AIRecentMay 29, 2026

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…

View →

cs.LGcs.AIstat.MLRecentMay 28, 2026

Calibrated Preference Learning: The Case of Label Ranking

Santo M. A. R. Thies, Viktor Bengs, Timo Kaufmann, Sebastian J. Vollmer +1 more

The paper formalizes the concept of calibration for probabilistic label ranking, demonstrating that popular models are often poorly calibrated and that calibration captures a meaningful quality dimens…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →

cs.AIcs.CLRecentMay 27, 2026

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

Dominika Agnieszka Długosz, Arlindo Oliveira, Natalia Díaz-Rodríguez

The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…

View →

stat.MLcs.AIcs.LGRecentMay 29, 2026

Correcting Split Selection in Online Decision Trees via Anytime-Valid Inference

Salim I. Amoukou, Saumitra Mishra, Manuela Veloso

The paper introduces a new anytime-valid inference method to correct split selection in online decision trees, providing robust statistical guarantees for streaming data that existing methods lack.

View →

cs.CRcs.CLRecentApr 28, 2026

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Alex Bogdan, Adrian de Valois-Franklin

The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…

View →

cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →