Papers similar to 2606.04459v1

~ similar to 2606.04459v1· 20 results

cs.CRcs.CLRecentApr 28, 2026

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…

View →

cs.CRcs.AIcs.LGRecentMar 24, 2026

Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs

Wenyu Chen, Xiangtao Meng, Chuanchao Zang, Li Wang +5 more

The paper proposes TriageFuzz, a token-aware fuzzing framework that significantly reduces the number of queries needed to jailbreak LLMs while maintaining high attack success rates.

View →

cs.CRcs.AIRecentMay 28, 2026

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

Yijia Fang, Yiqing Feng, Bingyu Li, Mingxun Zhou

The paper introduces KBF, a low-cost black-box auditing protocol that fingerprints LLM APIs by analyzing stable numerical recall near the knowledge boundary, successfully detecting numerous model subs…

View →

cs.CRcs.AIRecentMay 28, 2026

KBF: Knowledge Boundary as Fingerprint for Language Model and Black-Box API Auditing

Yijia Fang, Yiqing Feng, Bingyu Li, Mingxun Zhou

The paper introduces KBF, a novel black-box auditing protocol that fingerprints LLM APIs by analyzing stable numerical recall near the knowledge boundary, effectively detecting model substitutions and…

View →

cs.LGcs.AIcs.CRRecentMay 7, 2026

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk +1 more

The paper introduces PACZero, a novel PAC-private fine-tuning mechanism that achieves usable utility for large language models while providing strong resistance against membership-inference attacks.

View →

cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →

cs.CRcs.AIRecentApr 20, 2026

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

Meifang Chen, Zhe Yang, Huang Nianchen, Yizhan Huang +3 more

This paper investigates how Byte-Pair Encoding (BPE) tokenization causes Code LLMs to disproportionately memorize certain types of secrets, a phenomenon termed 'gibberish bias'.

View →

cs.CRRecentApr 19, 2026

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…

View →

cs.CRRecentApr 13, 2026

Can we Watermark Low-Entropy LLM Outputs?

Noam Mazor, Andrew Morgan, Rafael Pass

This paper develops provably undetectable and robust watermarking schemes for LLM outputs even when the per-token entropy is only constant, removing previous dependencies on high entropy rates or larg…

View →

cs.CRcs.LGRecentMay 28, 2026

Fingerprinting Inference Systems of Large Language Models

Anna Wimbauer, Jonas Möller, Erik Imgrund, Konrad Rieck

This paper introduces a fingerprinting method that exploits subtle numerical deviations in the inference system components (like the engine or hardware) to reliably identify the specific components us…

View →

cs.GTcs.CRcs.LGRecentMay 8, 2026

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.

View →

cs.CRcs.AIRecentMay 27, 2026

GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization

Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more

The paper introduces GEO-Bench, a unified benchmark that standardizes the evaluation of various generative engine optimization (GEO) ranking manipulation attacks, demonstrating that black-box content…

View →

cs.CRcs.AIRecentMay 27, 2026

GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization

Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more

GEO-Bench introduces a standardized benchmark to compare various ranking manipulation attacks (both black-box and white-box) on generative engines, demonstrating that black-box content rewriting can b…

View →

cs.CRcs.AIRecentApr 13, 2026

Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

Shuhao Zhang, Yuli Chen, Jiale Han, Bo Cheng +1 more

The paper proposes Adaptive Stealing (AS), a novel and more robust watermark stealing algorithm that dynamically selects optimal attack perspectives to significantly increase the efficiency of comprom…

View →

cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →

cs.LGcs.CRRecentMar 19, 2026

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo, Yongzhi Cao +1 more

This paper analyzes label inference attacks in Vertical Federated Learning (VFL), demonstrating that existing attacks rely on feature-label distribution alignment, and proposes a zero-overhead defense…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Shahinul Hoque, Jinghuai Zhang, Jinyuan Sun, Fnu Suya

The paper demonstrates that the current per-token billing model for LLMs is susceptible to systematic overcharging because auditing frameworks must rely on evidence provided by the very companies that…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Shahinul Hoque, Jinghuai Zhang, Jinyuan Sun, Fnu Suya

The paper demonstrates that the current per-token billing model for LLMs is susceptible to systematic inflation because auditing frameworks must rely on evidence provided by the service provider, crea…

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.LGcs.CRRecentApr 5, 2026

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Aobo Chen, Chenxu Zhao, Chenglin Miao, Mengdi Huai

The paper proposes a novel bi-level exact unlearning attack targeting Large Reasoning Models (LRMs) that forces incorrect final answers while generating misleading reasoning traces, highlighting new s…

View →