20 results for “implicit feedback”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo +8 more
The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned reco…
This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.
Junsoo Park, Youssef Medhat, Htet Phyo Wai, Ploy Thajchayapong +1 more
The paper proposes an interpretable, AI-driven decision layer that ranks course topics needing attention using multiple student and teacher signals, successfully identifying learning gaps before forma…
Zemin Yang, Yaoyu He, Yiming Zhong, Yuhao Zhang +4 more
The Implicit Drifting Policy (IDP) is a novel one-step action generation framework that implicitly enforces trajectory correction constraints by analyzing local expert action geometry, overcoming the…
The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…
The paper introduces Prompted Policy Optimization (PromptPO), an LLM-based method that successfully optimizes policies for various sequential RL tasks, demonstrating that LLMs can replace classical RL…
TailLoR is a new parameter-efficient finetuning method that uses the singular bases of pre-trained weights to learn low-rank updates, specifically penalizing updates along dominant directions to impro…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper proposes Joint Neighborhood Optimization (JNO), a novel knowledge-editing framework that jointly addresses the coupled pressures of desirable knowledge propagation and unintended knowledge l…
The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…
AlphaToken is a novel response token valuation framework that improves LLM post-training by decoupling token selection into task-specific adaptation and stability preservation, leading to better perfo…
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
The paper identifies a linear predictive law linking the initial performance gap in on-policy self-distillation (OPSD) to the final performance improvement, allowing researchers to anticipate and tune…
Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu +1 more
The paper introduces a diagnostic-driven iterative refinement process for improving LLM-generated reward functions in sparse, structured reinforcement learning tasks, significantly boosting agent perf…
Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more
The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…
The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…
The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
Bin Chen, Xinye Liao, Yiming Liu, Xin Liao +1 more
The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on…