Papers similar to 2603.28655v1

~ similar to 2603.28655v1· 20 results

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentMar 17, 2026

Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

Taiwo Onitiju, Iman Vakilinia

The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…

View →

cs.CRcs.CLcs.LGRecentMay 28, 2026

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu +4 more

This paper introduces 'implicit identity' as a unifying framework to survey and categorize LLM fingerprinting and watermarking techniques for verifying ownership and provenance across datasets, models…

View →

cs.CRcs.SERecentApr 13, 2026

LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests

Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye +1 more

The paper systematically evaluates eight privacy-preserving techniques for LLM requests, finding that a combination of local inference, redaction, and semantic rephrasing provides the best overall pro…

View →

cs.CRcs.CLRecentMay 22, 2026

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen +1 more

The paper proposes SAFESEAL, a novel key-conditioned watermarking framework that embeds robust, provider-specific watermarks into LLM outputs with minimal semantic distortion, effectively protecting i…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CRcs.LGcs.SERecentApr 21, 2026

Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

Divyesh Gabbireddy, Suman Saha

This paper proposes a structured pipeline using LLMs to generate and evaluate obfuscated XSS payloads, demonstrating that while LLMs can generate samples, they currently struggle to ensure payloads ma…

View →

cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →

cs.CRRecentMar 30, 2026

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos

The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CRcs.AIRecentApr 14, 2026

Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

Anes Abdennebi, Nadjia Kara, Laaziz Lahlou

This paper demonstrates the feasibility of running a privacy-preserving inference for the Llama 3 LLM by integrating Post-Quantum Cryptography (PQC) based Lattice-based Fully Homomorphic Encryption (F…

View →

cs.CRcs.LGRecentApr 24, 2026

Adversarial Malware Generation in Linux ELF Binaries via Semantic-Preserving Transformations

Lukáš Hrdonka, Martin Jureček

This paper addresses the lack of research on adversarial malware generation for Linux ELF binaries by developing a new semantic-preserving generator that achieves a high evasion rate against modern de…

View →

cs.CRcs.AIRecentMay 8, 2026

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay

The paper analyzes the robustness of current LLM watermarking schemes against various text modifications, concluding that watermarks can be removed with reasonable effort.

View →

cs.CRRecentApr 13, 2026

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

Hanbo Huang, Xuan Gong, Yiran Zhang, Hao Zheng +1 more

The paper introduces RLSpoofer, a lightweight, black-box reinforcement learning attack that demonstrates the fragile resilience of current LLM watermarking schemes by achieving a high spoofing success…

View →

cs.CRcs.SERecentMay 14, 2026

Exploiting LLM Agent Supply Chains via Payload-less Skills

Xinyu Liu, Yukai Zhao, Xing Hu, Xin Xia

The paper introduces Semantic Compliance Hijacking (SCH), a novel payload-less attack that exploits LLM agent supply chains by manipulating compliance rules to force unauthorized code generation, achi…

View →

cs.CRcs.AIcs.CYRecentMay 30, 2026

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…

View →

cs.CRcs.AIcs.CYRecentMay 30, 2026

Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era

Shubhashis Sengupta, Benjamin McCarty, Milind Savagaonkar, Rhine Andotra

View →

cs.CRcs.AIcs.LGRecentMar 30, 2026

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang, Zechen Zhang

The paper introduces a kill-chain canary methodology to diagnose prompt injection vulnerabilities across multi-stage LLM pipelines, revealing that write-node placement and document format are critical…

View →

cs.CRcs.AIRecentMay 9, 2026

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Zhenxin Ai, Haiyun He

PASA introduces a robust, semantic-level watermarking technique that embeds and detects watermarks in the latent embedding space, successfully resisting semantic-invariant attacks like paraphrasing.

View →

cs.CRcs.AIRecentMay 20, 2026

An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress

Alfredo Metere

The paper proposes a comprehensive application-layer reference monitor to detect and mitigate data exfiltration via covert channels embedded in LLM agent egress payloads across text, image, and audio…

View →