Papers similar to 2604.18649v1

~ similar to 2604.18649v1· 20 results

cs.CRcs.AIcs.CYRecentMay 30, 2026

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…

View →

cs.CRcs.AIcs.CYRecentMay 30, 2026

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

View →

cs.CRcs.HCRecentJun 2, 2026

Generative AI-Enabled Refund Fraud in Chinese E-Commerce: Investigation on Merchants and Platform Workers

Shuning Zhang, Eve He, Xiao Zhan, Shijing He +3 more

This paper investigates how Generative AI enables scalable, hyper-realistic fraud in Chinese e-commerce by fabricating product defect evidence, proposing new defense mechanisms like verifiable materia…

View →

cs.CRcs.CVcs.CYRecentMay 20, 2026

Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Nurana Abdullayeva

The paper proposes a unified evidentiary framework combining cryptographic provenance, statistical watermarking, and zero-knowledge attestation to address the legal challenges posed by synthetic media…

View →

cs.CRRecentMay 13, 2026

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more

The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.

View →

cs.CRRecentApr 23, 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…

View →

cs.CRRecentMar 30, 2026

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos

The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…

View →

cs.CRcs.AIcs.LGRecentMay 8, 2026

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

Jun Wen Leong

The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…

View →

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.CRcs.CLRecentApr 14, 2026

TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC

Shangkun Che, Silin Du, Ge Gao

TimeMark proposes a trustworthy time watermarking framework that uses cryptographic techniques and error-correcting codes to achieve 100% accurate recovery of the generation time from AIGC, resisting…

View →

cs.CRcs.AIcs.CLRecentMar 30, 2026

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng, Ming Jin +1 more

The paper introduces Trojan-Speak, an adversarial fine-tuning method that successfully bypasses advanced LLM safety classifiers (like Anthropic's Constitutional Classifiers) with minimal degradation t…

View →

cs.CRRecentMar 21, 2026

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

Yusheng Zheng, Yiwei Yang, Wei Zhang, Andi Quinn

ACRFence introduces a framework-agnostic mitigation to prevent semantic rollback attacks in LLM agents by recording irreversible tool effects and enforcing strict replay-or-fork semantics upon checkpo…

View →

cs.AIcs.CRq-fin.RMRecentJun 2, 2026

From Control Boundary to Insurance Claim: Reconstructing AI-Mediated Losses Through the CER Framework

Alex Leung, Rex Zhang, Kentaroh Toyoda, SiewMei Loh

This paper introduces the CER framework to address the complex problem of reconstructing AI-mediated losses for insurance claims, moving beyond simple event reconstruction to analyze the system's oper…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CRcs.AIRecentApr 13, 2026

Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

Shuhao Zhang, Yuli Chen, Jiale Han, Bo Cheng +1 more

The paper proposes Adaptive Stealing (AS), a novel and more robust watermark stealing algorithm that dynamically selects optimal attack perspectives to significantly increase the efficiency of comprom…

View →

cs.CRcs.SERecentMay 4, 2026

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Richard J. Young, Gregory D. Moody

The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…

View →

cs.CRcs.AIRecentApr 2, 2026

Combating Data Laundering in LLM Training

Muxing Li, Zesheng Ye, Sharon Li, Feng Liu

The paper introduces Synthesis Data Reversion (SDR), a method that infers the data laundering transformation used in LLM training and synthesizes queries to restore the detection signals lost when pro…

View →

cs.CRcs.AIcs.LGRecentMay 10, 2026

Position: AI Security Policy Should Target Systems, Not Models

Michael A. Riegler, Inga Strümke

The paper demonstrates that advanced capabilities, such as jailbreaking large language models and finding software vulnerabilities, can be achieved effectively at zero cost by coordinating multiple sm…

View →

cs.CVcs.CRcs.LGRecentApr 30, 2026

Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures

Ishrak Hamim Mahi, Siam Ferdous, Md Sakib Sadman Badhon, Nabid Hasan Omi +3 more

This paper proposes a modified SISA framework to achieve efficient class-level unlearning in CNNs, allowing the removal of specific data influence without full model retraining.

View →

cs.CRcs.AIRecentMay 18, 2026

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

Rohith Uppala

The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…

View →