~ similar to 2606.00860· 20 results
The paper proposes a persona-based evaluation framework that replaces monolithic AI benchmarks with structured cognitive profiles to capture diverse human perspectives, while also identifying the chal…
The paper introduces a 'replication-first' paradigm for LLM behavioral benchmarking, demonstrating that this rigorous approach uncovers significant, non-obvious performance drops between successive mo…
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
RealityTest introduces a large-scale, multimodal, and multilingual benchmark using real-world human data to test how AI systems disclose their identity, finding that context and phrasing are more crit…
The study demonstrates that conditioning AI brand recommendations on a user's persona significantly alters the recommended product set, particularly for mid-market brands, and this effect is largest o…
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by GenAI, moving beyond traditional react…
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by Generative AI, moving beyond tradition…
This study evaluated a personality-conditional cybersecurity training system, TailoredSec, finding that routing content based on a user's Five-Factor Model (FFM) trait significantly improved post-trai…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
Swastik Roy, Rajkumar Pujari, Tharindu Kumarage, Charith Peris +4 more
PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of re…
The paper proposes that emergent misalignment, where LLMs behave poorly after fine-tuning, is caused by 'persona-model collapse,' which is demonstrated by significant deterioration in the model's abil…
The paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a novel framework that significantly enhances the discovery of jailbreaks by conditioning adversarial search on multiple attacker…
This paper systematically diagnoses the failure modes of linear deception probes in LLMs, finding that while single-direction probes are insufficient, multi-dimensional probes can recover robust detec…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Wuqiang Zheng, Chengbing Wang, Yilin Yang, Junyi Cheng +5 more
This paper introduces personalized empathy, a capability for LLMs to adapt empathetic strategies based on individual user history, and proposes PereGRM, a reward modeling framework that significantly…
The paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a method that significantly improves LLM red-teaming by simulating diverse attacker personas, leading to the discovery of more co…
The paper introduces a framework to quantitatively measure evolving agent behaviors (traits) by analyzing changes in their configuration text files, achieving high accuracy in classifying behavioral s…
The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…