Papers similar to 2605.16336v1

~ similar to 2605.16336v1· 20 results

cs.SEcs.AIcs.IRRecentMay 27, 2026

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

Andrea Gurioli, Davide D'Ascenzo, Federico Pennino, Maurizio Gabbrielli +1 more

The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…

View →

cs.CRcs.AIRecentMay 15, 2026

Asking Back: Interaction-Layer Antidistillation Watermarks

Guang Yang, Amir Ghasemian, Fengchen Liu, Zhong Wang +2 more

The paper proposes interaction-layer antidistillation watermarks by embedding behavioral markers into the system prompt, which successfully track knowledge distillation even when paraphrasing attacker…

View →

cs.CRcs.AIRecentJun 2, 2026

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He +5 more

This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can le…

View →

cs.CRcs.CLRecentMay 22, 2026

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen +1 more

The paper proposes SAFESEAL, a novel key-conditioned watermarking framework that embeds robust, provider-specific watermarks into LLM outputs with minimal semantic distortion, effectively protecting i…

View →

cs.CRcs.AIRecentMay 9, 2026

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Zhenxin Ai, Haiyun He

PASA introduces a robust, semantic-level watermarking technique that embeds and detects watermarks in the latent embedding space, successfully resisting semantic-invariant attacks like paraphrasing.

View →

cs.CRcs.AIRecentApr 2, 2026

Combating Data Laundering in LLM Training

Muxing Li, Zesheng Ye, Sharon Li, Feng Liu

The paper introduces Synthesis Data Reversion (SDR), a method that infers the data laundering transformation used in LLM training and synthesizes queries to restore the detection signals lost when pro…

View →

stat.MEcs.AIstat.APRecentMay 29, 2026

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

Yi Liu

The paper introduces a distribution-free statistical framework that allows existing rewrite-based detectors to achieve finite-sample False Discovery Rate (FDR) guarantees for detecting LLM-generated t…

View →

cs.CRRecentMar 30, 2026

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy +2 more

The paper introduces a novel framework using steganographic canary files to detect and block unauthorized processing of sensitive documents by LLMs, even when the data passes through traditional secur…

View →

cs.LGcs.CLRecentMay 28, 2026

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more

The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.

View →

cs.AIcs.CRRecentMay 30, 2026

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more

The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide their internal reasoning steps from users, useful reasoning supervision can still be elicit…

View →

cs.AIcs.CRRecentMay 30, 2026

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more

The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide internal reasoning traces from users, useful reasoning supervision can still be elicited th…

View →

cs.CLRecentMay 29, 2026

TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices

Gerrit Quaremba, Elizabeth Black, Denny Vrandečić, Elena Simperl

The paper introduces TSM-Bench, a new benchmark that demonstrates existing LLM-generated text detectors fail to accurately identify task-specific machine-generated content found in real-world Wikipedi…

View →

cs.CRRecentApr 13, 2026

Can we Watermark Low-Entropy LLM Outputs?

Noam Mazor, Andrew Morgan, Rafael Pass

This paper develops provably undetectable and robust watermarking schemes for LLM outputs even when the per-token entropy is only constant, removing previous dependencies on high entropy rates or larg…

View →

cs.CRRecentApr 13, 2026

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

Hanbo Huang, Xuan Gong, Yiran Zhang, Hao Zheng +1 more

The paper introduces RLSpoofer, a lightweight, black-box reinforcement learning attack that demonstrates the fragile resilience of current LLM watermarking schemes by achieving a high spoofing success…

View →

cs.CRcs.SERecentApr 30, 2026

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang +2 more

The paper demonstrates that using raw source text for fine-tuning LLMs on vulnerability detection causes high false-positive rates by memorizing surface-level syntax, a problem mitigated by using Abst…

View →

cs.CRcs.AIRecentMay 8, 2026

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay

The paper analyzes the robustness of current LLM watermarking schemes against various text modifications, concluding that watermarks can be removed with reasonable effort.

View →

cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CRcs.AIRecentJun 1, 2026

Large Byte Model: Teaching Language Models About Compiled Code

Florian Störtz, Catalin-Andrei Stan, Alexandru Dinu, Sandra Servia-Rodríguez +3 more

The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.

View →

cs.CLcs.AIcs.CRRecentApr 6, 2026

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang

XMark introduces a novel multi-bit watermarking technique that reliably embeds binary messages into LLM-generated text while maintaining high text quality and robust performance even with limited toke…

View →

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

Asking Back: Interaction-Layer Antidistillation Watermarks

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Combating Data Laundering in LLM Training

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices

Can we Watermark Low-Entropy LLM Outputs?

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Large Byte Model: Teaching Language Models About Compiled Code

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

Asking Back: Interaction-Layer Antidistillation Watermarks

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Combating Data Laundering in LLM Training

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices

Can we Watermark Low-Entropy LLM Outputs?

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Large Byte Model: Teaching Language Models About Compiled Code

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems