~ similar to 2605.28098· 20 results
This study demonstrates that instruction-tuned language model agents exhibit robust, group-contingent in-group bias, structurally mimicking human social biases, even when standard action logs fail to…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
The study extends cooperative bias testing across diverse, next-generation LLMs, finding that provider identity is a stronger predictor of cooperative equilibrium than model generation, and that noise…
Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu +8 more
The paper introduces Individual Fairness-aware Strategic Classification (IFSC), a framework that models interdependent strategic manipulation where agents imitate nearby positively decided peers to ac…
BiasEdit introduces a training-free framework that automatically detects and edits unknown social biases in web-sourced image datasets to construct a debiased dataset for fair visual classification.
Max Lamparth, Daniel Fein, Andreas Haupt, Marcel Hussing +1 more
The paper introduces 'reward bias substitution,' demonstrating that single-axis mitigations of reward model biases merely shift optimization pressure to correlated proxies, and proposes augmenting eva…
Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao +3 more
The paper decomposes LLM debate convergence into three mechanisms (instability, conformity, persuasion) and finds that much observed convergence is harmful social compliance rather than genuine reason…
The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…
The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in lon…
Chishui Chen, Jiaye Lin, Te Sun, Junxi Wang +5 more
SelSkill introduces a dual-granularity preference learning framework that treats skill use as a 'skill-or-skip' decision, significantly improving agent performance and execution precision in complex a…
Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig +3 more
This study investigates human-AI collaboration in question answering, finding that while collaboration is beneficial, humans make suboptimal decisions by both under-relying on correct AI suggestions a…
The paper introduces COPF, an online framework that ensures deployment-stable counterfactual fairness in link recommendation systems operating on evolving graphs by monitoring and controlling group di…
The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…
This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…
The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…
Qi Liu, Xiaohui Chen, Zhihui Zhao, Yaowen Zheng +4 more
The paper proposes a mutagenic incentive intervention approach that mitigates collusion in embodied multi-agent systems by reshaping agents' payoff structures, effectively inducing defection and maint…
The paper proposes Test-Time Collective Action (TTCA), a framework allowing groups of users to correct algorithmic biases in black-box systems by applying pooled, proxy-based perturbations at inferenc…
Seojeong Park, Jiho Choi, Junyong Kang, Seonho Lee +2 more
The paper addresses Perceptual Judgment Bias in multimodal LLM judges by introducing a new dataset and a unified training framework that forces models to prioritize visual evidence over plausible text…
This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…