Papers similar to 2603.30034v1

~ similar to 2603.30034v1· 20 results

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.LGcs.AIRecentMay 31, 2026

CEAR: Certified Ensemble Adversarial Robustness in DNNs

Daniel Sadig, Mohammadreza Maleki, Hamed Karimi, Reza Samavi

The paper proposes CEAR, an ensemble-based method that combines empirical and certified defenses to achieve superior provable robustness against adversarial attacks in Deep Neural Networks.

View →

cs.CRRecentJun 4, 2026

Robust Ensemble of Selectively Strengthened and Augmented Predictors

Parsa Memarzadehsaghezi, Zahra Hashemi, Pooria Madani, Mehran Ebrahimi

The paper proposes RESSAP, a novel ensemble framework that significantly enhances the robustness of machine learning classifiers against adversarial evasion attacks by combining feature selection, ens…

View →

cs.CRcs.AIcs.LGRecentMay 5, 2026

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda +2 more

The paper introduces Sparse Backdoor, a novel supply-chain attack that embeds a provably undetectable backdoor into pre-trained image classifiers by injecting structured sparse perturbations.

View →

cs.CRcs.AIcs.LGRecentJun 2, 2026

High-Precision APT Malware Attribution with Out-of-Scope Resilience

Peter Williams, Adam Sobey, Erisa Karafili

The paper introduces a high-precision APT malware attribution method that uses ranked binary classifiers with explicit abstention, significantly improving accuracy when encountering unknown or out-of-…

View →

cs.CRcs.LGRecentApr 22, 2026

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo, Antonino Nocera +1 more

The paper proposes a certifiably robust malware detection framework using randomized smoothing and feature ablation to guarantee detection accuracy against metamorphic evasion attacks.

View →

cs.CRRecentMay 4, 2026

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

Kemal Derya, Berk Sunar

The paper introduces a new adaptive jailbreak attack (JB-GCG) that successfully bypasses the state-of-the-art JBShield defense, and proposes a more robust defense (RTV) based on multi-layer representa…

View →

cs.LGcs.CRRecentMay 25, 2026

On Reliability of Efficient Membership Inference Vulnerability Evaluation

Joonas Jälkö, Gauri Pradhan, Ossi Räisä, Antti Honkela

This paper analyzes the reliability of efficient membership inference attack (MIA) evaluation methods, demonstrating that standard aggregation techniques introduce biases that compromise accurate vuln…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CRRecentMay 13, 2026

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more

The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.

View →

cs.CRRecentMay 29, 2026

When Entropy Is Not Enough: Multi-Modal Classification of Encrypted and Compressed Data Fragments

Fabio De Gaspari, Dorjan Hitaj, Samuele Salaris, Luigi V. Mancini

The paper proposes Triumvir, a multi-modal ensemble architecture that significantly improves the classification of small, raw data fragments to distinguish between encrypted and compressed data, outpe…

View →

cs.CRcs.AIRecentMar 29, 2026

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

He Yang, Dongyi Lv, Song Ma, Wei Xi +1 more

Sneakdoor introduces a novel backdoor attack method that enhances stealthiness in dataset condensation by using a generative module to create input-aware triggers, achieving high attack efficacy while…

View →

cs.GTcs.CRcs.LGRecentMay 8, 2026

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.

View →

cs.CRstat.APRecentMay 8, 2026

Combating Organized Platform Abuse: Amplifying Weak Risk Signals with Structural Information

Meng He, Jia Long Loh

The paper proposes a novel structural invariant approach, derived from the economic constraints of fraud, that amplifies weak, low-precision signals into highly accurate fraud detections without requi…

View →

cs.CRRecentMay 23, 2026

Ellipsoid Control: A White-list Jailbreak Defense via Benign Latent Modeling

Luoyu Chen, Weiqi Wang, Zhiyi Tian, Feng Wu +2 more

The paper proposes Ellipsoid Control, a white-list defense mechanism that uses benign data geometry to constrain model updates, thereby enhancing jailbreak safety while preserving the utility of harml…

View →

cs.LGcs.CRRecentApr 21, 2026

Mechanistic Anomaly Detection via Functional Attribution

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

The paper proposes reframing mechanistic anomaly detection (MAD) as a functional attribution problem, using influence functions to measure how much a model's output depends on specific input samples,…

View →

cs.CRcs.AIcs.CLRecentMar 23, 2026

SecureBreak -- A dataset towards safe and secure models

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera

The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.

View →

cs.CVcs.CRRecentMay 5, 2026

A Deeper Dive into the Irreversibility of PolyProtect: Making Protected Face Templates Harder to Invert

Vedrana Krivokuća Hahn, Jérémy Maceiras, Sébastien Marcel

The paper enhances the security of the PolyProtect biometric template protection method by proposing a key selection algorithm that significantly increases the difficulty of inverting protected face t…

View →

cs.CRcs.AIRecentApr 14, 2026

SpanKey: Dynamic Key Space Conditioning for Neural Network Access Control

WenBin Yan

SpanKey proposes a lightweight method to control neural network access by conditioning intermediate activations on secret keys constrained to a defined subspace, enabling dynamic gating without weight…

View →

cs.CRcs.AIcs.LGRecentJun 4, 2026

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

Seungwon Jeong, Jiwoo Jeong, Hyeonjin Kim, Yunseok Lee +1 more

The paper introduces SlotGCG, an improved jailbreak attack method that systematically searches for the most vulnerable token insertion positions (slots) within a prompt, significantly boosting attack…

View →