~ similar to 2605.31090· 20 results
Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more
This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…
The paper introduces the Calibrated Entropy Score (CES), a single-pass, black-box method that uses the distribution of token-level entropies to detect model hallucinations with high accuracy and forma…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…
Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra +1 more
The paper introduces Entropic Projection Alignment (EPA), a unified framework that estimates, explains, and improves model performance under distribution shift by aligning source and target distributi…
The paper introduces Inconsistency-Aware Minimization (IAM), a novel training objective that uses a label-free measure called local inconsistency to improve model generalization, particularly in semi-…
Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…
The paper proposes REED, a post-training representation editing method that significantly improves cross-domain linguistic steganalysis performance by deterministically editing intermediate feature re…
The paper proposes Triumvir, a multi-modal ensemble architecture that significantly improves the classification of small, raw data fragments to distinguish between encrypted and compressed data, outpe…
The study demonstrates that robust, domain-invariant representations of synthetic deception can be rapidly entrenched in LLMs using modest fine-tuning, detectable by linear probes even in early layers…
The paper introduces an entropy-aware masking strategy for Masked Language Modeling (MLM) that targets informative and uncertain tokens, achieving up to a 5% performance improvement on GLUE scores.
Haodong Zhao, Tianyi Xu, Tianhang Zhao, Zhuosheng Zhang +1 more
GradSentry introduces a novel backdoor sample filtering method that uses the spectral entropy of individual sample gradients to detect poisoned data during LLM fine-tuning, proving effective even at h…
This study comparatively evaluates four CNN architectures (VGG16, ResNet50, EfficientNetB0, and XceptionNet) for fake image detection, finding VGG16 achieved the highest accuracy (91%).
This paper proposes a 3D CNN detector that leverages temporal artifacts to accurately identify high-quality deepfake videos, demonstrating robust detection even after social media re-encoding.
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
This paper systematically diagnoses the failure modes of linear deception probes in LLMs, finding that while single-direction probes are insufficient, multi-dimensional probes can recover robust detec…
Mathias Graf, Marco Willi, Melanie Mathys, Michael Aerni +3 more
DeepSignature proposes a novel, cryptographically verifiable watermarking system that uses deep neural networks to embed digital signatures into images, enabling robust source attribution and near 100…
The paper proposes evaluating certified training methods by comparing their Pareto fronts across the natural-certified accuracy trade-off, revealing superior performance and previously unappreciated c…
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…