~ similar to 2605.12813v2· 20 results
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…
The paper introduces an A*-inspired framework to generate highly effective and efficient adversarial prompts that cause LLMs to hallucinate commonsense errors while maintaining the original prompt's i…
This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
The paper introduces Evidence-Carrying Agents (ECA) to prevent multimodal agents from executing privileged actions based on unsupported or hallucinated perceptual claims, achieving near-zero unsafe ex…
Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more
The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…
The paper introduces Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses code-related hallucinations, significantly reducing the risk of package confusion attacks like slo…
Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more
This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…
Hanze Li, Jinhao You, Yichen Guo, Kai Tang +2 more
The paper introduces DeLask, a novel decoding framework that dynamically skips or partially aggregates problematic decoder layers to significantly mitigate hallucinations in Large Language Models.
The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…
The paper evaluates the adversarial robustness of two open-source Vision-Language Models (LLaVA and Qwen2.5-VL) in a simulated e-commerce environment, finding that while LLaVA is vulnerable to gradien…
The paper introduces the Calibrated Entropy Score (CES), a single-pass, black-box method that uses the distribution of token-level entropies to detect model hallucinations with high accuracy and forma…
SilentRetrieval introduces a sophisticated, two-stage data poisoning attack that successfully hijacks Retrieval-Augmented Generation (RAG) systems by injecting adversarially crafted, yet highly fluent…
The paper introduces Neutral Prompting Attacks (NPA), a stealthy method showing that semantically benign prompts can covertly increase package hallucination in coding agents, creating new software sup…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85%…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85% reduction…
The study demonstrates that robust, domain-invariant representations of synthetic deception can be rapidly entrenched in LLMs using modest fine-tuning, detectable by linear probes even in early layers…
The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangero…
This paper introduces 'Visual Inception,' a novel attack that poisons long-term memory in agentic recommender systems using images, and proposes CognitiveGuard, a dual-process defense framework to mit…