Papers similar to 2604.17133v1

~ similar to 2604.17133v1· 20 results

cs.CRcs.AIRecentApr 10, 2026

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu +4 more

The paper proposes ADAM, a novel and highly effective privacy attack that systematically extracts sensitive data from LLM agent memory by adaptively querying the victim agent's memory based on data di…

View →

cs.LGcs.AIcs.CRRecentApr 17, 2026

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar +3 more

The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.

View →

cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →

cs.CRcs.AIRecentJun 3, 2026

SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du +2 more

SharedRequest introduces a model-agnostic framework that enhances LLM privacy and efficiency by batching and mixing prompts with noisy variants, achieving high utility and significant cost reduction.

View →

cs.CRRecentApr 26, 2026

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

The paper introduces LLM-CEG, an extended framework that uses membership inference attack success rates and model perplexity to systematically audit and optimize the privacy-utility trade-off when fin…

View →

cs.AIRecentMay 28, 2026

Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison

Tiancheng Yang, Matthias Schonlau, Ilia Sucholutsky

The paper introduces a diagnostic benchmark for selective Question Answering over conflicting, multi-source personal memory, demonstrating that specialized fusion resolvers outperform general LLMs, es…

View →

cs.AIRecentMay 28, 2026

VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data

Di Zhu, Yu Yvonne Wu, Hong Jia, Aaqib Saeed +2 more

VitalAgent is a novel tool-augmented agentic framework that significantly improves physiological monitoring from wearable health data by enabling both reactive question answering and proactive, long-t…

View →

cs.CLRecentMay 31, 2026

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

Md Motaleb Hossen Manik, Ge Wang

HypothesisMed introduces an inference-time pipeline for biomedical question answering that improves model reliability and structured output generation by fusing multiple model outputs and diagnosing t…

View →

cs.CRcs.AIRecentApr 16, 2026

CAMP: Cumulative Agentic Masking and Pruning for Privacy Protection in Multi-Turn LLM Conversations

Aman Panjwani

The paper proposes CAMP, a cross-turn privacy framework that mitigates Cumulative PII Exposure (CPE) in multi-turn LLM conversations by tracking and masking accumulated personal data across the entire…

View →

cs.CLRecentMay 31, 2026

DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

Qing Wang, Bo Li, Jialu Liang, Daling Shi +2 more

The paper introduces DrugClaw, a multi-agent system, and DrugAudit, a new benchmark, demonstrating that DrugClaw excels at answering drug-related questions by grounding answers in primary regulatory s…

View →

cs.CRcs.LGRecentMar 24, 2026

Privacy-Preserving EHR Data Transformation via Geometric Operators: A Human-AI Co-Design Technical Report

Maolin Wang, Beining Bao, Gan Yuan, Hongyu Chen +8 more

The paper proposes a novel data transformation framework that creates semantically rich, privacy-preserving numeric views of EHR data, enabling large-scale research while provably breaking patient lin…

View →

cs.IRcs.AIRecentMay 29, 2026

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more

The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…

View →

cs.CRcs.AIRecentJun 2, 2026

Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation

Xinyue Huang, Xiaochun Cao, Wenyuan Yang

The paper introduces a Contextual Integrity (CI) framework and a new benchmark (DelegateCI-Bench) to rewrite user queries sent to cloud LLMs, ensuring only task-essential information is retained while…

View →

cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →

cs.CRcs.AIcs.OSRecentApr 21, 2026

An AI Agent Execution Environment to Safeguard User Data

Robert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas +1 more

The paper introduces GAAP, an execution environment that deterministically guarantees the confidentiality of private user data by enforcing user-defined permission specifications on AI agents, even ag…

View →

cs.AIcs.CLcs.CYRecentMay 27, 2026

MIRA: A Bilingual Benchmark for Medical Information Response Audit

Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more

The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…

View →

cs.CRcs.LGRecentMay 23, 2026

CyberMaskQA: A Privacy-Aware Benchmark for Evaluating Large Language Models in Cybersecurity Question Answering

Matilda Gaddi, Jin Noh, Onat Gungor, Tajana Rosing

The paper introduces CYBERMASKQA, a novel privacy-aware benchmark designed to evaluate Large Language Models' ability to perform accurate cybersecurity question answering while simultaneously preservi…

View →

cs.HCcs.CRRecentMay 11, 2026

When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information

Kyzyl Monteiro, Minjung Park, Alexander Ioffrida, Angelina Sanna +5 more

This study investigated user reactions to inferred personal information from their own ChatGPT histories, finding that acceptability is governed by context-sensitive norms regarding generation, retent…

View →

cs.CLRecentMay 31, 2026

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta

This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…

View →