~ similar to 2604.17133v1· 20 results
Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu +4 more
The paper proposes ADAM, a novel and highly effective privacy attack that systematically extracts sensitive data from LLM agent memory by adaptively querying the victim agent's memory based on data di…
Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar +3 more
The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du +2 more
SharedRequest introduces a model-agnostic framework that enhances LLM privacy and efficiency by batching and mixing prompts with noisy variants, achieving high utility and significant cost reduction.
The paper introduces LLM-CEG, an extended framework that uses membership inference attack success rates and model perplexity to systematically audit and optimize the privacy-utility trade-off when fin…
The paper introduces a diagnostic benchmark for selective Question Answering over conflicting, multi-source personal memory, demonstrating that specialized fusion resolvers outperform general LLMs, es…
Di Zhu, Yu Yvonne Wu, Hong Jia, Aaqib Saeed +2 more
VitalAgent is a novel tool-augmented agentic framework that significantly improves physiological monitoring from wearable health data by enabling both reactive question answering and proactive, long-t…
HypothesisMed introduces an inference-time pipeline for biomedical question answering that improves model reliability and structured output generation by fusing multiple model outputs and diagnosing t…
The paper proposes CAMP, a cross-turn privacy framework that mitigates Cumulative PII Exposure (CPE) in multi-turn LLM conversations by tracking and masking accumulated personal data across the entire…
Qing Wang, Bo Li, Jialu Liang, Daling Shi +2 more
The paper introduces DrugClaw, a multi-agent system, and DrugAudit, a new benchmark, demonstrating that DrugClaw excels at answering drug-related questions by grounding answers in primary regulatory s…
Maolin Wang, Beining Bao, Gan Yuan, Hongyu Chen +8 more
The paper proposes a novel data transformation framework that creates semantically rich, privacy-preserving numeric views of EHR data, enabling large-scale research while provably breaking patient lin…
Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more
The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…
The paper introduces a Contextual Integrity (CI) framework and a new benchmark (DelegateCI-Bench) to rewrite user queries sent to cloud LLMs, ensuring only task-essential information is retained while…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas +1 more
The paper introduces GAAP, an execution environment that deterministically guarantees the confidentiality of private user data by enforcing user-defined permission specifications on AI agents, even ag…
Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more
The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…
The paper introduces CYBERMASKQA, a novel privacy-aware benchmark designed to evaluate Large Language Models' ability to perform accurate cybersecurity question answering while simultaneously preservi…
This study investigated user reactions to inferred personal information from their own ChatGPT histories, finding that acceptability is governed by context-sensitive norms regarding generation, retent…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
The paper proposes RPSG, a method that uses private seeds and differential privacy to generate highly realistic and strongly privacy-preserving synthetic data replicas of private text for LLMs.