20 results for “Reward redistribution”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.
Van An Nguyen, Vuong Khang Huynh, Huu Loi Bui, Hai Anh Ha +7 more
This paper introduces a welfare-centric framework for designing institutional incentives, showing that optimizing for total social welfare often requires different incentive levels than those optimize…
This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
The paper analyzes transaction selection strategies in DAG-based distributed ledgers using game theory, finding that Collaborative Fee Sharing (CFS) achieves superior performance compared to Random Fe…
The paper introduces Temporary Power Adjusting Withholding (T-PAW), a generalized and more potent block withholding attack than the existing PAW attack, demonstrating that this attack can yield signif…
The paper introduces ARCA, a novel credit assignment method that measures token salience directly from the adapter's residual hidden state, addressing the degeneracy of standard intrinsic signals when…
This paper analyzes the conditions under which Bitcoin's security might fail due to miners deviating from honest mining when block rewards decline to zero, concluding that protocol mechanisms can miti…
The paper analyzes and documents various double-dip reward abuse attacks that exploit flaws in how cashback and reward engines handle transaction refunds, proposing formal invariants and defensive alg…
The paper proposes In-Context Reward Adaptation, a transformer-based framework that uses in-context learning and auxiliary signals (like human response time) to robustly model diverse and unseen human…
SwarmHarness introduces a decentralized, incentive-aligned protocol enabling self-organizing compute swarms for AI tasks, eliminating the need for central coordinators or heavy blockchain infrastructu…
Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu +1 more
The paper introduces a diagnostic-driven iterative refinement process for improving LLM-generated reward functions in sparse, structured reinforcement learning tasks, significantly boosting agent perf…
Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas +6 more
The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational r…
This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.
ShaplEIG introduces a Bayesian experimental design framework to efficiently and adaptively estimate Shapley values by minimizing the number of required costly function evaluations.
OccuReward introduces an LLM-guided framework and a Comfort Equity Index (CEI) to shape building energy rewards, demonstrating that iterative refinement significantly improves occupant comfort equity…
The paper models latent worker preferences in gig labor markets using the Preisach hysteresis model, demonstrating that predicting acceptance rates can simultaneously reduce labor costs and increase s…
The paper proposes using distributional Reinforcement Learning (RL) to stabilize learning in chaotic dynamical systems by optimizing the smooth evolution of the return distribution rather than individ…
The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.
Yuchen Liu, Yingjie Feng, Lixiong Qin, Jiasi Chen +4 more
The paper introduces Graph-Distance Contribution Reward (GDCR) and Step Advantage Policy Optimization (SAPO) to provide fine-grained, step-level credit assignment for agentic search by modeling world…