"Reward redistribution" | ArxivCSExplorer

20 results for “Reward redistribution”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.LGcs.AIEmpiricalRecentJun 4, 2026

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

View →

cs.GTcs.AIcs.MARecentMay 29, 2026

Social welfare optimisation under institutional reward and punishment

Van An Nguyen, Vuong Khang Huynh, Huu Loi Bui, Hai Anh Ha +7 more

This paper introduces a welfare-centric framework for designing institutional incentives, showing that optimizing for total social welfare often requires different incentive levels than those optimize…

View →

stat.MLcs.AIcs.LGRecentMay 28, 2026

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…

View →

cs.LGcs.CLRecentJun 2, 2026

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…

View →

cs.GTcs.CRRecentMay 8, 2026

Game-Theoretic Analysis of Transaction Selection in DAG-Based Distributed Ledgers

Sebastian Müller, Alexandre Reiffers-Masson

The paper analyzes transaction selection strategies in DAG-based distributed ledgers using game theory, finding that Collaborative Fee Sharing (CFS) achieves superior performance compared to Random Fe…

View →

cs.CRcs.DCcs.ITRecentApr 15, 2026

Temporary Power Adjusting Withholding Attack

Mustafa Doger, Sennur Ulukus

The paper introduces Temporary Power Adjusting Withholding (T-PAW), a generalized and more potent block withholding attack than the existing PAW attack, demonstrating that this attack can yield signif…

View →

cs.LGcs.AIRecentMay 29, 2026

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

Rodney Lafuente-Mercado

The paper introduces ARCA, a novel credit assignment method that measures token salience directly from the adapter's residual hidden state, addressing the degeneracy of standard intrinsic signals when…

View →

cs.CRcs.DCcs.GTRecentJun 3, 2026

Bitcoin After Block Rewards

Junhyuk Lee

This paper analyzes the conditions under which Bitcoin's security might fail due to miners deviating from honest mining when block rewards decline to zero, concluding that protocol mechanisms can miti…

View →

cs.CRcs.CERecentApr 5, 2026

Refunded but Rewarded: The Double Dip Attack on Cashback Reward Engines

S M Zia Ur Rashid, Suman Rath

The paper analyzes and documents various double-dip reward abuse attacks that exploit flaws in how cashback and reward engines handle transaction refunds, proposing formal invariants and defensive alg…

View →

cs.LGcs.AIRecentMay 28, 2026

In-Context Reward Adaptation for Robust Preference Modeling

Zhenyu Sun, Zheng Xu, Ermin Wei

The paper proposes In-Context Reward Adaptation, a transformer-based framework that uses in-context learning and auxiliary signals (like human response time) to robustly model diverse and unseen human…

View →

cs.AIcs.DCcs.MARecentMay 27, 2026

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks

Edwin Jose

SwarmHarness introduces a decentralized, incentive-aligned protocol enabling self-organizing compute swarms for AI tasks, eliminating the need for central coordinators or heavy blockchain infrastructu…

View →

cs.LGcs.AIcs.IRRecentMay 27, 2026

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu +1 more

The paper introduces a diagnostic-driven iterative refinement process for improving LLM-generated reward functions in sparse, structured reinforcement learning tasks, significantly boosting agent perf…

View →

cs.LGcs.AIRecentJun 2, 2026

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas +6 more

The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational r…

View →

cs.LGcs.AIcs.CLRecentJun 3, 2026

Reinforcement Learning from Rich Feedback with Distributional DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.

View →

stat.MLcs.LGRecentJun 1, 2026

ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation

David Rundel, Fabian Fumagalli, Maximilian Muschalik, Bernd Bischl +1 more

ShaplEIG introduces a Bayesian experimental design framework to efficiently and adaptively estimate Shapley values by minimizing the number of required costly function evaluations.

View →

cs.AIRecentMay 27, 2026

OccuReward: LLM-Guided Occupant-Centric Reward Shaping for Demographic Equity in Grid-Interactive Buildings

Shadmehr Zaregarizi, Khashayar Yavari

OccuReward introduces an LLM-guided framework and a Comfort Equity Index (CEI) to shape building energy rewards, demonstrating that iterative refinement significantly improves occupant comfort equity…

View →

cs.LGecon.GNstat.MLRecentJun 3, 2026

Worker Utility as Hysteresis: A Preisach Model of Transaction Acceptance in Gig Labour Markets

Piotr Frydrych

The paper models latent worker preferences in gig labor markets using the Preisach hysteresis model, demonstrating that predicting acceptance rates can simultaneously reduce labor costs and increase s…

View →

cs.LGcs.AIRecentMay 28, 2026

On Distributional Reinforcement Learning in Chaotic Dynamical Systems

James Rudd-Jones, Mirco Musolesi, María Pérez-Ortiz

The paper proposes using distributional Reinforcement Learning (RL) to stabilize learning in chaotic dynamical systems by optimizing the smooth evolution of the return distribution rather than individ…

View →

cs.GTcs.CRcs.LGRecentMay 8, 2026

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.

View →

cs.AIRecentMay 28, 2026

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Yuchen Liu, Yingjie Feng, Lixiong Qin, Jiasi Chen +4 more

The paper introduces Graph-Distance Contribution Reward (GDCR) and Step Advantage Policy Optimization (SAPO) to provide fine-grained, step-level credit assignment for agentic search by modeling world…

View →