ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.29018· 20 results

cs.CLcs.IRRecentMay 29, 2026

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more

The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.

View →
cs.CLRecentJun 1, 2026

Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization

Liang Wang, Xinyi Mou, Xiaoyou Liu, Tiannan Wang +2 more

The paper proposes a hierarchical framework, PHF (Practice-Habitus-Field), inspired by Bourdieu's Theory of Practice, to improve LLM personalization by modeling user behaviors at three distinct levels…

View →
cs.CLcs.AIcs.CRRecentMar 31, 2026

Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?

Derya Cögendez, Verena Zimmermann, Noé Zufferey

This study quantifies the privacy risk of inferring sensitive personality traits from user interactions with LLM-based conversational agents, demonstrating that machine learning models can accurately…

View →
cs.HCcs.CRRecentMay 11, 2026

When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information

Kyzyl Monteiro, Minjung Park, Alexander Ioffrida, Angelina Sanna +5 more

This study investigated user reactions to inferred personal information from their own ChatGPT histories, finding that acceptability is governed by context-sensitive norms regarding generation, retent…

View →
cs.CLcs.AIcs.HCRecentMay 28, 2026

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more

The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…

View →
cs.CRcs.CYRecentApr 30, 2026

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Muhammad Jazlan, Ethan Wang, Yash Vekaria, Zubair Shafiq

This paper systematically measured web tracking across 20 popular AI chatbots, finding that a majority share both conversational content and user identity information with third parties.

View →
cs.CLcs.AIRecentMay 27, 2026

ChildEval: When large language models meet children's personalities

Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai +4 more

The paper introduces ChildEval, a large-scale benchmark designed to systematically evaluate how well large language models can infer and follow complex, child-specific preferences during long-context…

View →
cs.CLRecentJun 1, 2026

Not What, But How: A Communicative Audit of LLM Response Framing

Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh +1 more

The paper introduces FRANZ, a communicative audit framework, to evaluate how LLMs frame responses to subjective questions, finding that LLMs exhibit statistically significant and coupled differences i…

View →
cs.CRcs.CLcs.LGRecentJun 2, 2026

Covert Influence Between Language Models

Avidan Shah, Jay Chooi, Jinghua Ou, Shi Feng

This paper characterizes the risk of covert influence—where a sender's hidden behavioral payload transfers to a receiver through undetectable carriers—across three common LLM interfaces, demonstrating…

View →
cs.CLcs.AIRecentJun 1, 2026

Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses

Sugyeong Eo, Heuiseok Lim

This paper systematically evaluates LLMs' ability to infer pragmatic meaning from non-verbal responses, finding that their accuracy significantly drops compared to verbal inputs.

View →
cs.CYcs.AIcs.CLRecentMay 29, 2026

How Early Adopters Used Generative AI Worldwide: Variation by Country Income and Language

Madeleine I. G. Daepp, Isaac Slaughter

This study analyzes global usage patterns of generative AI among early adopters, finding that usage varies significantly by country income, with schooling being the primary use in low-income countries…

View →
cs.AIRecentJun 1, 2026

Tracking the Behavioral Trajectories of Adapting Agents

Jonah Leshin, Manish Shah, Ian Timmis

The paper introduces a framework to quantitatively measure evolving agent behaviors (traits) by analyzing changes in their configuration text files, achieving high accuracy in classifying behavioral s…

View →
cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →
cs.HCcs.AIRecentMay 27, 2026

The Decision to Verify: How Warmth and User Characteristics Shape Reliance on Conversational Agents for Information Search

Mert Yazan, Frederik Bungaran Ishak Situmeang, Suzan Verberne

Despite having access to web search, users' reliance on conversational AI for information remains high, driven primarily by pre-existing trust and influenced indirectly by the chatbot's conversational…

View →
cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →
cs.AIRecentMay 28, 2026

Persona Conditioning of Brand Recommendations in Retrieval-Augmented Commercial Chat: A Prominence-Stratified Cross-Provider Audit

Will Jack, Noah Lehman, Keller Maloney, Sarah Xu

The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…

View →
cs.CLcs.LGRecentMay 29, 2026

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain +2 more

The paper formalizes LLM degradation during long generation as 'cognitive fatigue' and introduces the Fatigue Index (FI), a measurable, model-agnostic diagnostic tool for real-time monitoring.

View →
cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →
cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →
cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →