"Fine-grained credit assignment"

20 results for “Fine-grained credit assignment”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.LGcs.AIEmpiricalRecentJun 4, 2026

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…

View →

cs.LGcs.AIRecentMay 29, 2026

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

Rodney Lafuente-Mercado

The paper introduces ARCA, a novel credit assignment method that measures token salience directly from the adapter's residual hidden state, addressing the degeneracy of standard intrinsic signals when…

View →

cs.LGcs.AIRecentMay 28, 2026

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

Mustafa Uzun, Mete Erdogan, Cengiz Pehlevan, Alper T. Erdogan

The paper introduces Score Broadcast and Decorrelation (SBD), a general theoretical framework that unifies broadcast-based credit assignment across various differentiable loss functions by leveraging…

View →

cs.MAcs.AIRecentMay 28, 2026

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more

The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…

View →

cs.AIRecentMay 28, 2026

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Yuchen Liu, Yingjie Feng, Lixiong Qin, Jiasi Chen +4 more

The paper introduces Graph-Distance Contribution Reward (GDCR) and Step Advantage Policy Optimization (SAPO) to provide fine-grained, step-level credit assignment for agentic search by modeling world…

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.LGcs.AIRecentMay 29, 2026

From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets

Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer +1 more

The paper introduces PRAXIS, a novel algorithm that efficiently approximates the computation of 'Rashomon sets' for decision trees, significantly reducing memory and runtime complexity.

View →

cs.CLcs.LGRecentJun 1, 2026

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

Vladimir Beskorovainyi

The paper proposes a robust, multi-stage pipeline combining rule-based classification and machine learning to map noisy retail product names to standardized consumption categories, finding that simple…

View →

cs.CLRecentMay 28, 2026

Auditing LLM Benchmarks with Item Response Theory

Sander Land, Daniel M. Bikel

The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…

View →

cs.AIcs.CLcs.CYRecentMay 29, 2026

On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral

Quinten Steenhuis, Jacqueline Harvey

The paper evaluates an automated legal triage system (FETCH) that uses follow-up questions, demonstrating that while low-cost LLMs are effective for classification, generating high-quality questions r…

View →

cs.CLRecentMay 28, 2026

Generating and Refining Dynamic Evaluation Rubrics for LLM-as-a-Judge

Zijie Wang, Eduardo Blanco

The paper introduces a novel, training-free method to automatically generate fine-grained evaluation rubrics for LLM-as-a-Judge, and further proposes an iterative fine-tuning strategy that significant…

View →

cs.AIRecentMay 30, 2026

KACE: Knowledge-Adaptive Context Engineering for Mathematical Reasoning

Jayant Parashar, Suchendra M. Bhandarkar

KACE introduces a novel knowledge-adaptive context engineering framework that separates knowledge storage from usage, significantly improving mathematical reasoning accuracy on challenging benchmarks…

View →

cs.LGcs.CLRecentJun 1, 2026

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Mind Lab, :, Song Cao, Vic Cao +51 more

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto l…

View →

cs.GTcs.CRcs.LGRecentMay 8, 2026

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.

View →

cs.CVRecentJun 1, 2026

ToolFG: Towards Well-Grounded Fine-Grained Image Classification

Yu Xue, Haoxuan Qu, Zhuoling Li, Yihang Lou +3 more

The paper introduces ToolFG, a novel tool-integrated MLLM framework that enhances fine-grained image classification by enabling models to autonomously use external tools to gather verifiable visual cu…

View →

cs.CLcs.LGRecentMay 29, 2026

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection

Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more

The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.

View →

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.CLRecentMay 30, 2026

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

James Xu Zhao, Hui Chen, Bryan Hooi, See-Kiong Ng

FineVerify introduces a fine-grained self-verification framework that improves agentic search by decomposing complex questions into verifiable sub-questions, leading to significant accuracy gains over…

View →

cs.HCcs.AIRecentMay 27, 2026

Learning to Assign Prediction Tasks to Agents with Capacity Constraints

Shang Wu, Saatvik Kher, Padhraic Smyth

This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…

View →