~ similar to 2603.30034v1· 20 results
Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…
The paper proposes CEAR, an ensemble-based method that combines empirical and certified defenses to achieve superior provable robustness against adversarial attacks in Deep Neural Networks.
The paper proposes RESSAP, a novel ensemble framework that significantly enhances the robustness of machine learning classifiers against adversarial evasion attacks by combining feature selection, ens…
The paper introduces Sparse Backdoor, a novel supply-chain attack that embeds a provably undetectable backdoor into pre-trained image classifiers by injecting structured sparse perturbations.
The paper introduces a high-precision APT malware attribution method that uses ranked binary classifiers with explicit abstention, significantly improving accuracy when encountering unknown or out-of-…
The paper proposes a certifiably robust malware detection framework using randomized smoothing and feature ablation to guarantee detection accuracy against metamorphic evasion attacks.
The paper introduces a new adaptive jailbreak attack (JB-GCG) that successfully bypasses the state-of-the-art JBShield defense, and proposes a more robust defense (RTV) based on multi-layer representa…
This paper analyzes the reliability of efficient membership inference attack (MIA) evaluation methods, demonstrating that standard aggregation techniques introduce biases that compromise accurate vuln…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…
Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more
The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.
The paper proposes Triumvir, a multi-modal ensemble architecture that significantly improves the classification of small, raw data fragments to distinguish between encrypted and compressed data, outpe…
Sneakdoor introduces a novel backdoor attack method that enhances stealthiness in dataset condensation by using a generative module to create input-aware triggers, achieving high attack efficacy while…
The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.
The paper proposes a novel structural invariant approach, derived from the economic constraints of fraud, that amplifies weak, low-precision signals into highly accurate fraud detections without requi…
Luoyu Chen, Weiqi Wang, Zhiyi Tian, Feng Wu +2 more
The paper proposes Ellipsoid Control, a white-list defense mechanism that uses benign data geometry to constrain model updates, thereby enhancing jailbreak safety while preserving the utility of harml…
The paper proposes reframing mechanistic anomaly detection (MAD) as a functional attribution problem, using influence functions to measure how much a model's output depends on specific input samples,…
The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.
The paper enhances the security of the PolyProtect biometric template protection method by proposing a key selection algorithm that significantly increases the difficulty of inverting protected face t…
SpanKey proposes a lightweight method to control neural network access by conditioning intermediate activations on secret keys constrained to a defined subspace, enabling dynamic gating without weight…
Seungwon Jeong, Jiwoo Jeong, Hyeonjin Kim, Yunseok Lee +1 more
The paper introduces SlotGCG, an improved jailbreak attack method that systematically searches for the most vulnerable token insertion positions (slots) within a prompt, significantly boosting attack…