ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “implicit feedback”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.IRcs.AIRecentMay 27, 2026

Toward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo +8 more

The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned reco…

View →
cs.LGcs.AIcs.CLRecentJun 3, 2026

Reinforcement Learning from Rich Feedback with Distributional DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.

View →
cs.AIcs.CLcs.HCRecentMay 28, 2026

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

Junsoo Park, Youssef Medhat, Htet Phyo Wai, Ploy Thajchayapong +1 more

The paper proposes an interpretable, AI-driven decision layer that ranks course topics needing attention using multiple student and teacher signals, successfully identifying learning gaps before forma…

View →
cs.ROcs.AIRecentMay 31, 2026

Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry

Zemin Yang, Yaoyu He, Yiming Zhong, Yuhao Zhang +4 more

The Implicit Drifting Policy (IDP) is a novel one-step action generation framework that implicitly enforces trajectory correction constraints by analyzing local expert action geometry, overcoming the…

View →
cs.CLRecentMay 28, 2026

Auditing LLM Benchmarks with Item Response Theory

Sander Land, Daniel M. Bikel

The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…

View →
cs.LGcs.AIRecentMay 29, 2026

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

Stephane Hatgis-Kessell, Emma Brunskill

The paper introduces Prompted Policy Optimization (PromptPO), an LLM-based method that successfully optimizes policies for various sequential RL tasks, demonstrating that LLMs can replace classical RL…

View →
cs.LGRecentJun 4, 2026

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau +1 more

TailLoR is a new parameter-efficient finetuning method that uses the singular bases of pre-trained weights to learn low-rank updates, specifically penalizing updates along dominant directions to impro…

View →
cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →
cs.AIRecentJun 1, 2026

Revisiting Ripple Effects in Knowledge Editing through Pressure-Aware Joint Neighborhood Optimization

Haoben Huang, Shuxin Liu, Ou Wu, Di Gao

The paper proposes Joint Neighborhood Optimization (JNO), a novel knowledge-editing framework that jointly addresses the coupled pressures of desirable knowledge propagation and unintended knowledge l…

View →
math.NAcs.LGRecentJun 1, 2026

Spectral Audit of In-Context Operator Networks

Zhiwei Gao, Liu Yang, George Em Karniadakis

The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…

View →
cs.CLcs.AIRecentJun 1, 2026

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

Liu Qing, Ou Wu, Yi Du

AlphaToken is a novel response token valuation framework that improves LLM post-training by decoupling token selection into task-specific adaptation and stability preservation, leading to better perfo…

View →
cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →
cs.LGcs.AIRecentMay 28, 2026

A Predictive Law for On-Policy Self-Distillation From World Feedback

Tommy He, Jerome Sieber, Matteo Saponati

The paper identifies a linear predictive law linking the initial performance gap in on-policy self-distillation (OPSD) to the final performance improvement, allowing researchers to anticipate and tune…

View →
cs.LGcs.AIcs.IRRecentMay 27, 2026

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu +1 more

The paper introduces a diagnostic-driven iterative refinement process for improving LLM-generated reward functions in sparse, structured reinforcement learning tasks, significantly boosting agent perf…

View →
cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →
cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →
cs.AIRecentMay 28, 2026

Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit

Will Jack, Noah Lehman, Keller Maloney, Sarah Xu

The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…

View →
cs.AIRecentMay 27, 2026

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Chen Ying Claude, Zhihan Luo

The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…

View →
cs.AIRecentJun 1, 2026

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

Bin Chen, Xinye Liao, Yiming Liu, Xin Liao +1 more

The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on…

View →