Papers similar to 2603.29182v1

~ similar to 2603.29182v1· 20 results

cs.LGcs.CRRecentMay 18, 2026

A No-Defense Defense Against Gradient-Based Adversarial Attacks on ML-NIDS: Is Less More?

The paper demonstrates that simpler, shallower Deep Neural Network architectures with reduced features and ReLU activations can inherently improve the robustness of ML-NIDS against gradient-based adve…

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more

This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…

View →

cs.CRcs.CLRecentApr 24, 2026

Training a General Purpose Automated Red Teaming Model

Aishwarya Padmakumar, Leon Derczynski, Traian Rebedea, Christopher Parisien

The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…

View →

cs.CRcs.AIcs.LGRecentJun 2, 2026

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

Vincent Limbach, Jonas Dornbusch, David Lüdke, Stephan Günnemann +1 more

The paper introduces Indirect Harm Optimization (IHO), a novel black-box, adaptive, and efficient attack method that significantly improves jailbreak success rates against LLMs, aiming to provide a st…

View →

cs.LGcs.CRcs.CVRecentMay 22, 2026

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

Phuc Duc Nguyen, Quang Duc Nguyen

The paper introduces a sample-wise targeted adversarial attack that successfully misclassifies only specific, triggered inputs during test-time adaptation while maintaining the overall label distribut…

View →

cs.LGcs.CRRecentMay 26, 2026

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Kevin Kuo, Chhavi Yadav, Virginia Smith

This paper demonstrates that existing open-weight LLM safeguards are vulnerable to simple, non-gradient-based attacks like abliteration and prefilling, significantly increasing the attack success rate…

View →

cs.CRcs.LGcs.SERecentJun 3, 2026

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

Bin Duan, Zeyu Bai, Guowei Yang

The paper introduces ParDef, a generalized defense mechanism that effectively mitigates various types of parameter attacks on deep neural networks while maintaining high performance.

View →

cs.CRRecentApr 8, 2026

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

Adrian Shuai Li, Md Ajwad Akil, Elisa Bertino

The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…

View →

cs.CRRecentJun 1, 2026

On Improving Robustness of Deepfake Image Detectors

Abu Taib Mohammed Shahjahan, Mohammad Mannan, Abdessamad Ben Hamza, Amr Youssef

The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…

View →

cs.LGcs.AIcs.CVRecentMay 30, 2026

SORA: Free Second-Order Attacks in Fast Adversarial Training

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

The paper introduces SORA, an adaptive adversarial training method that dynamically adjusts perturbation sizes to prevent Catastrophic Overfitting, achieving state-of-the-art robustness and clean accu…

View →

cs.CRcs.AIcs.DCRecentApr 10, 2026

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan

The paper introduces XFED, a novel non-collusive model poisoning attack that demonstrates the feasibility of compromising Federated Learning systems without requiring coordination among attackers, byp…

View →

cs.CRcs.LGRecentMay 9, 2026

Enhancing Adversarial Robustness in Network Intrusion Detection: A Layer-wise Adaptive Regularization Approach

Hira Nasir, Eiman Javed, Balawal Shabir, Zunera Jalil +1 more

The paper proposes LARAR, a novel layer-wise adaptive regularization approach that enhances the adversarial robustness of neural network-based Network Intrusion Detection Systems by analyzing and miti…

View →

cs.LGcs.AIRecentMay 31, 2026

CEAR: Certified Ensemble Adversarial Robustness in DNNs

Daniel Sadig, Mohammadreza Maleki, Hamed Karimi, Reza Samavi

The paper proposes CEAR, an ensemble-based method that combines empirical and certified defenses to achieve superior provable robustness against adversarial attacks in Deep Neural Networks.

View →

cs.LGcs.CRcs.CVRecentMay 25, 2026

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

Aditya Sridhar

This paper demonstrates that Concept Bottleneck Models (CBMs), despite their interpretability, are highly vulnerable to targeted adversarial attacks that manipulate semantic concepts, and proposes SPE…

View →

cs.CRcs.AIEmpiricalRecentJun 29, 2026

Defending Against Harmful Supervision Hidden in Benign Samples

Bang An, Yibo Yang, Dandan Guo, Ebtisam Alshehri +2 more

The paper proposes Dual-Reference SFT to mitigate harmful fine-tuning of language models by embedding harmful QA pairs in benign training samples.

View →

cs.CRcs.LGRecentJun 2, 2026

RogueMerge: Robust and Unified Attacks against LLM Model Merging

Jinghuai Zhang, Yetian He, Kunlin Cai, Han Zhao +2 more

RogueMerge introduces a unified framework to robustly attack LLM model merging by addressing the challenges of autoregressive decoding, unknown merging configurations, and prompt generalization, signi…

View →

econ.THcs.CRcs.GTRecentMay 5, 2026

The Adversarial Discount -- AI, Signal Correlation, and the Cybersecurity Arms Race

James W. Bono

The paper models the cybersecurity arms race using a contest-theoretic framework, showing that full cross-correlation of threat intelligence can neutralize the attacker's structural advantage from inc…

View →

cs.CRcs.LGRecentMay 11, 2026

FedSurrogate: Backdoor Defense in Federated Learning via Layer Criticality and Surrogate Replacement

Fatima Z. Abacha, Sin G. Teo, Yuanxiang Wu, Lucas C. Cordeiro +1 more

FedSurrogate introduces a novel backdoor defense for Federated Learning that uses layer-criticality analysis and surrogate replacement to significantly reduce false positives while maintaining high mo…

View →

cs.CRcs.AIRecentMar 17, 2026

Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

Taiwo Onitiju, Iman Vakilinia

The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…

View →