Papers similar to 2605.28098

~ similar to 2605.28098· 20 results

cs.AIRecentMay 27, 2026

Human-like in-group bias in instruction-tuned language model agents

This study demonstrates that instruction-tuned language model agents exhibit robust, group-contingent in-group bias, structurally mimicking human social biases, even when standard action logs fail to…

View →

cs.CRcs.AIRecentMay 8, 2026

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more

The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.

View →

cs.MAcs.AIcs.GTRecentMay 28, 2026

Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension

Francisco León Zúñiga Bolívar

The study extends cooperative bias testing across diverse, next-generation LLMs, finding that provider identity is a stronger predictor of cooperative equilibrium than model generation, and that noise…

View →

cs.LGcs.AIRecentMay 30, 2026

Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation

Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu +8 more

The paper introduces Individual Fairness-aware Strategic Classification (IFSC), a framework that models interdependent strategic manipulation where agents imitate nearby positively decided peers to ac…

View →

cs.CVcs.AIRecentMay 27, 2026

BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual Classifiers

Jungwook Seo, Yoonsik Park, Changmin Lee, Sungyong Baik

BiasEdit introduces a training-free framework that automatically detects and edits unknown social biases in web-sourced image datasets to construct a debiased dataset for fair visual classification.

View →

cs.AIRecentMay 27, 2026

Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure

Max Lamparth, Daniel Fein, Andreas Haupt, Marcel Hussing +1 more

The paper introduces 'reward bias substitution,' demonstrating that single-axis mitigations of reward model biases merely shift optimization pressure to correlated proxies, and proposes augmenting eva…

View →

cs.CLRecentMay 30, 2026

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao +3 more

The paper decomposes LLM debate convergence into three mechanisms (instability, conformity, persuasion) and finds that much observed convergence is harmful social compliance rather than genuine reason…

View →

cs.CRcs.LGRecentMay 24, 2026

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…

View →

cs.AIRecentMay 28, 2026

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in lon…

View →

cs.CLcs.AIRecentMay 30, 2026

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

Chishui Chen, Jiaye Lin, Te Sun, Junxi Wang +5 more

SelSkill introduces a dual-granularity preference learning framework that treats skill use as a 'skill-or-skip' decision, significantly improving agent performance and execution precision in complex a…

View →

cs.AIcs.CLcs.HCRecentMay 27, 2026

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig +3 more

This study investigates human-AI collaboration in question answering, finding that while collaboration is beneficial, humans make suboptimal decisions by both under-relying on correct AI suggestions a…

View →

cs.LGcs.AIRecentMay 30, 2026

COPF: An Online Framework for Deployment-Stable Counterfactual Fairness in Evolving Graphs

Sheng'en Li, Dongmian Zou

The paper introduces COPF, an online framework that ensures deployment-stable counterfactual fairness in link recommendation systems operating on evolving graphs by monitoring and controlling group di…

View →

cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →

cs.MAcs.CRcs.LGRecentApr 25, 2026

Architecture Matters for Multi-Agent Security

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…

View →

cs.CRRecentMay 23, 2026

Reframing LLM Agent Security as an Agent-Human Interaction Problem

Peiran Wang, Ying Li, Yuan Tian

The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…

View →

cs.CRcs.MARecentApr 26, 2026

Breaking the Secret: Economic Interventions for Combating Collusion in Embodied Multi-Agent Systems

Qi Liu, Xiaohui Chen, Zhihui Zhao, Yaowen Zheng +4 more

The paper proposes a mutagenic incentive intervention approach that mitigates collusion in embodied multi-agent systems by reshaping agents' payoff structures, effectively inducing defection and maint…

View →

cs.LGcs.CRRecentMay 26, 2026

Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms

Meghana Bhange, Ulrich Aïvodji, Elliot Creager

The paper proposes Test-Time Collective Action (TTCA), a framework allowing groups of users to correct algorithmic biases in black-box systems by applying pooled, proxy-based perturbations at inferenc…

View →

cs.CVcs.AIRecentJun 1, 2026

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Seojeong Park, Jiho Choi, Junyong Kang, Seonho Lee +2 more

The paper addresses Perceptual Judgment Bias in multimodal LLM judges by introducing a new dataset and a unified training framework that forces models to prioritize visual evidence over plausible text…

View →

cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →

cs.CLcs.CRRecentMay 9, 2026

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

Jialing Gan, Junhao Dong, Songze Li

The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…

View →