Papers similar to 2605.28089

~ similar to 2605.28089· 18 results

cs.CLcs.AIRecentMay 27, 2026

ChildEval: When large language models meet children's personalities

Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai +4 more

The paper introduces ChildEval, a large-scale benchmark designed to systematically evaluate how well large language models can infer and follow complex, child-specific preferences during long-context…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.CVcs.AIRecentMay 28, 2026

Toward Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children's Data

Caio Petrucci, Leo Sampaio Ferraz Ribeiro, Sandra Avila

The paper introduces a generalized zero-shot benchmark for facial age estimation that ethically excludes children's data during training, demonstrating that current state-of-the-art models fail signif…

View →

cs.CRcs.AIcs.CYRecentMar 28, 2026

Gender-Based Heterogeneity in Youth Privacy-Protective Behavior for Smart Voice Assistants: Evidence from Multigroup PLS-SEM

Molly Campbell, Yulia Bobkova, Ajay Kumar Shrestha

The study finds exploratory evidence that gender moderates how youth perceive privacy risks and benefits, influencing their protective behavior when using smart voice assistants.

View →

cs.CLRecentJun 1, 2026

Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization

Liang Wang, Xinyi Mou, Xiaoyou Liu, Tiannan Wang +2 more

The paper proposes a hierarchical framework, PHF (Practice-Habitus-Field), inspired by Bourdieu's Theory of Practice, to improve LLM personalization by modeling user behaviors at three distinct levels…

View →

cs.AIRecentJun 1, 2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more

The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…

View →

cs.CRcs.AIcs.CLRecentApr 1, 2026

Do Phone-Use Agents Respect Your Privacy?

Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye +18 more

The paper introduces MyPhoneBench, a new framework that demonstrates that current phone-use agents often fail to respect user privacy, even when successfully completing simple tasks, primarily due to…

View →

cs.CRcs.CYRecentMay 3, 2026

What's on Your Mind? Exploring Privacy of Mental Health Apps

Chloe Georgiou, Hans Lu, Emiliano De Cristofaro, Gene Tsudik

The paper analyzed 25 popular mental health apps and found significant privacy gaps, revealing that most apps fail to disclose embedded trackers and dangerous permissions, undermining informed user co…

View →

cs.CRcs.AIcs.CYRecentApr 4, 2026

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Molly Campbell, Mohamad Sheikho Al Jasem, Ajay Kumar Shrestha

This study proposes a negotiation framework, using composite indices (RBTI and CATI), to explain how youth navigate competing privacy pressures when using smart voice assistants, finding that high usa…

View →

cs.AIRecentMay 28, 2026

NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

Yunjin Qi, Zhaojun Jiang, Xuan Wu, Hanxi Pan +9 more

The paper introduces NICE, a novel, theory-grounded diagnostic benchmark for assessing the social intelligence of LLMs, which reveals that current frontier models consistently struggle with specific f…

View →

cs.AIRecentJun 1, 2026

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

Wenhao Wang, Peizhi Niu, Gongyi Zou, Xiyuan Yang +8 more

The paper introduces MCP-Persona, a novel benchmark designed to evaluate LLM agents' performance on real-world, personalized applications using the Model Context Protocol (MCP), revealing that current…

View →

cs.LGcs.CRRecentApr 29, 2026

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Guillermo Iglesias, Gema Bello-Orgaz, María Navas-Loro, Cristian Ramirez-Atencia +2 more

This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…

View →

cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →

cs.CRRecentMay 7, 2026

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

Jiahao Chen, Qi Zhang, Ruixiao Lin, Chunyi Zhou +6 more

The paper introduces the PrivacyIceberg framework to systematically categorize and empirically demonstrate the high risk of automated, deep personal profiling using LLM agents, revealing a significant…

View →

cs.CRcs.CLRecentMay 27, 2026

MaskClaw: Edge-Side Personalized Privacy Arbitration for GUI Agents with Behavior-Driven Skill Evolution

Yanqiu Zhao, Dongying Zheng, Kaibo Huang, Yukun Wei +2 more

MaskClaw is an edge-side privacy arbitrator that protects sensitive data in GUI agent screenshots by combining local visual evidence, task-specific policies, and a skill-evolution mechanism.

View →

cs.AIRecentMay 27, 2026

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Chen Ying Claude, Zhihan Luo

The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…

View →

cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →

cs.CRcs.LGRecentMay 12, 2026

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

James Flemings, Murali Annavaram

The paper introduces PrivacySIM, an evaluation suite that benchmarks how well LLMs can simulate individual user privacy decisions based on persona attributes, finding that while conditioning improves…

View →