~ similar to 2605.10436v1· 20 results
The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…
Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more
This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…
The paper proposes a novel method to generate adversarial malware samples that evade deep learning detectors while simultaneously minimizing the detectable 'drift' signals, showing that similarity con…
The paper proposes REACT, an adversarial training framework that significantly enhances the robustness and few-shot performance of machine-generated text detection by having a Retrieval-Augmented Gene…
The paper introduces an end-to-end framework that not only detects network intrusions using deep learning but also generates actionable, citation-grounded mitigation reports using a Retrieval-Augmente…
The paper introduces a new adaptive jailbreak attack (JB-GCG) that successfully bypasses the state-of-the-art JBShield defense, and proposes a more robust defense (RTV) based on multi-layer representa…
Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
FlowGuard introduces an identity-independent defense using flow matching to detect data-free model stealing attacks by identifying synthetic queries as out-of-distribution based on their lower-dimensi…
The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…
Xiangtao Meng, Wenyu Chen, Chuanchao Zang, Xinyu Gao +4 more
This paper systematically measures and explains how sequential model defenses can conflict, finding that 38.9% of ordered defense sequences cause measurable risk exacerbation due to anti-aligned param…
This paper demonstrates that in-domain pretraining of BERT significantly improves the detection of DNS exfiltration, particularly in maintaining a low false positive rate.
The paper introduces NeWTral, a framework that restores safety alignment to specialized LLM adapters without sacrificing their domain-specific knowledge, achieving a significant reduction in attack su…
Lixing Lin, Juli You, Yue Li, Luyun Lin +3 more
Reflect-Guard enhances LLM safety classifiers by integrating logical self-reflection, significantly improving detection of sophisticated adversarial jailbreak prompts.
Luca Minnei, Cristian Manca, Giorgio Piras, Angelo Sotgiu +5 more
The paper proposes a model-agnostic framework to evaluate combining Active Learning (AL) and Semi-Supervised Learning (SSL) techniques for malware detection, demonstrating that these combined methods…
The paper proposes CEAR, an ensemble-based method that combines empirical and certified defenses to achieve superior provable robustness against adversarial attacks in Deep Neural Networks.
Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…
The paper introduces RefusalGuard, a novel fine-tuning framework that preserves the geometric structure of safety-relevant representations in LLMs, thereby mitigating the degradation of refusal behavi…
The paper proposes a cost-aware, adaptive maintenance framework using Reinforcement Learning (RL) and self-supervised learning to mitigate performance degradation (concept drift) in Android malware de…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…