~ similar to 2603.28043v1· 20 results
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
The paper proposes a novel structural invariant approach, derived from the economic constraints of fraud, that amplifies weak, low-precision signals into highly accurate fraud detections without requi…
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more
This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.
The paper proposes a graph-based framework for detecting attacks in LLM agent tool-call traffic, finding that content-level embeddings are crucial for high accuracy and that tree ensembles on these em…
The paper introduces BAIT, a three-step jailbreak framework that systematically forces large language models to disclose harmful information by leveraging their internal reasoning and consistency tend…
The paper proposes an unsupervised method using multiple statistical indicators to detect adversarial or compromised context documents in Retrieval Augmented Generation (RAG) systems, even without kno…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
The paper introduces the PROMPT framework to systematically analyze and mitigate privacy risks in online propaganda detection pipelines, demonstrating that current widely used methods are often non-co…
This study provides the first large-scale analysis of video piracy on Telegram, quantifying its massive financial impact and developing a resilient detection framework, Anti-RIP, to combat it.
This paper demonstrates that retrieval-augmented in-context learning systems for document QA are vulnerable to membership inference attacks, proposing novel black-box methods that exploit query prefix…
The paper introduces an end-to-end framework that not only detects network intrusions using deep learning but also generates actionable, citation-grounded mitigation reports using a Retrieval-Augmente…
Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more
The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…
The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…
The paper introduces Tree structured Injection for Payloads (TIP), a novel black-box attack framework that reliably generates stealthy injection payloads to seize control of LLM agents utilizing the M…
The paper introduces TeleHunt, a comprehensive framework and tool that systematically evaluates various strategies for efficiently discovering cybercriminal communities operating on Telegram.
The paper proposes an embarrassingly simple detector that monitors model extraction attacks by testing whether the aggregate distribution of incoming LLM queries deviates from the historical distribut…
The paper introduces Involuntary In-Context Learning (IICL), an effective few-shot pattern completion attack that can bypass safety alignments in large language models, achieving a 24.0% bypass rate a…
The paper proposes a framework to intentionally evade malware detectors by adding a small number of benign API imports, successfully demonstrating targeted misclassification into a chosen benign categ…
PHANTOM is a novel framework that generates highly convincing, context-aware honeytokens by incorporating deep organizational knowledge, significantly improving their believability and detection resis…
The paper demonstrates that generative AI can automate and scale highly personalized, context-aware spear-phishing attacks using only public social media data, resulting in messages that are significa…