~ similar to 2605.28089· 18 results
Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai +4 more
The paper introduces ChildEval, a large-scale benchmark designed to systematically evaluate how well large language models can infer and follow complex, child-specific preferences during long-context…
KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.
The paper introduces a generalized zero-shot benchmark for facial age estimation that ethically excludes children's data during training, demonstrating that current state-of-the-art models fail signif…
The study finds exploratory evidence that gender moderates how youth perceive privacy risks and benefits, influencing their protective behavior when using smart voice assistants.
Liang Wang, Xinyi Mou, Xiaoyou Liu, Tiannan Wang +2 more
The paper proposes a hierarchical framework, PHF (Practice-Habitus-Field), inspired by Bourdieu's Theory of Practice, to improve LLM personalization by modeling user behaviors at three distinct levels…
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye +18 more
The paper introduces MyPhoneBench, a new framework that demonstrates that current phone-use agents often fail to respect user privacy, even when successfully completing simple tasks, primarily due to…
The paper analyzed 25 popular mental health apps and found significant privacy gaps, revealing that most apps fail to disclose embedded trackers and dangerous permissions, undermining informed user co…
This study proposes a negotiation framework, using composite indices (RBTI and CATI), to explain how youth navigate competing privacy pressures when using smart voice assistants, finding that high usa…
Yunjin Qi, Zhaojun Jiang, Xuan Wu, Hanxi Pan +9 more
The paper introduces NICE, a novel, theory-grounded diagnostic benchmark for assessing the social intelligence of LLMs, which reveals that current frontier models consistently struggle with specific f…
Wenhao Wang, Peizhi Niu, Gongyi Zou, Xiyuan Yang +8 more
The paper introduces MCP-Persona, a novel benchmark designed to evaluate LLM agents' performance on real-world, personalized applications using the Model Context Protocol (MCP), revealing that current…
This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
Jiahao Chen, Qi Zhang, Ruixiao Lin, Chunyi Zhou +6 more
The paper introduces the PrivacyIceberg framework to systematically categorize and empirically demonstrate the high risk of automated, deep personal profiling using LLM agents, revealing a significant…
Yanqiu Zhao, Dongying Zheng, Kaibo Huang, Yukun Wei +2 more
MaskClaw is an edge-side privacy arbitrator that protects sensitive data in GUI agent screenshots by combining local visual evidence, task-specific policies, and a skill-evolution mechanism.
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
The paper introduces PrivacySIM, an evaluation suite that benchmarks how well LLMs can simulate individual user privacy decisions based on persona attributes, finding that while conditioning improves…