~ similar to 2605.30685· 20 results
RealityTest introduces a large-scale, multimodal, and multilingual benchmark using real-world human data to test how AI systems disclose their identity, finding that context and phrasing are more crit…
This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…
The paper introduces CARTE, a new benchmark designed to test how well large language models understand fine-grained, regionally differentiated knowledge across the 13 metropolitan regions of France, r…
This paper systematically measured web tracking across 20 popular AI chatbots, finding that a majority share both conversational content and user identity information with third parties.
Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam +2 more
The paper critiques current AI benchmarking practices for low-resource settings, arguing that evaluation must shift focus from isolated model performance to the holistic performance of the deployed sy…
This paper analyzes top-tier cybersecurity papers to find evidence of generative AI's influence, finding a post-2022 increase in AI-associated marker words and a general drift toward higher lexical co…
Analyzing longitudinal data from 12,000 Copilot users, the paper finds that individual user habits regarding LLM interaction are highly sticky and difficult to change, and that existing datasets may o…
The paper successfully demonstrates that Large Language Models (LLMs) can be induced to adopt coherent, human-like value structures, showing strong alignment with human psychological patterns.
The paper proposes viewing national AI development, specifically in France, as a 'national AI learning system' governed by a controlled balance between information injection and entropy dissipation, a…
The paper introduces a diagnostic framework to decompose multilingual LLM performance variance, showing that language identity and model-benchmark interactions are key drivers of performance gaps.
This paper uses machine learning to model a country's GDP based on working hours and productivity, demonstrating that the differing relative importance of these two factors between Germany and the USA…
The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…
Despite having access to web search, users' reliance on conversational AI for information remains high, driven primarily by pre-existing trust and influenced indirectly by the chatbot's conversational…
This study surveyed higher education practitioners to map their beliefs and behaviors regarding AI integration, finding that while they view AI favorably, institutional barriers and gaps in design-ori…
Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana +1 more
The paper demonstrates that using English prompts causes large language models to prioritize globally dominant narratives over local cultural knowledge, even when local evidence is provided.
Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more
The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…
Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai +4 more
The paper introduces ChildEval, a large-scale benchmark designed to systematically evaluate how well large language models can infer and follow complex, child-specific preferences during long-context…
Jiaxun Cao, Yu Dong, Chunxi Zhan, Rithvik Neti +2 more
The paper investigates how users perceive and utilize security and privacy transparency in consumer-facing generative AI, finding that users rely on proxies like popularity and require actionable, tru…
The paper argues that purported anthropomorphic attributes of LLMs are not unique to language models but are substrate-dependent, demonstrating this by training a neural network on the game Age of Emp…
Ran Jin, Liu Wang, Shidong Pan, Luona Xu +2 more
This study investigates user perceptions of privacy risks associated with GenAI smartphones, finding that users express heightened concerns across the entire data lifecycle and suggest comprehensive,…