Papers similar to 2606.01298

~ similar to 2606.01298· 18 results

cs.CLRecentMay 29, 2026

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank +1 more

This paper proposes a unified evaluation framework for hate speech detection that systematically assesses model performance and explainability across various label and rationale representation spaces,…

View →

cs.CLcs.AIcs.CVRecentMay 29, 2026

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

Paramananda Bhaskar, Naquee Rizwan, Daksh Jogchand, Saurabh Kumar Pandey +1 more

The paper introduces FBHM, a new benchmark for hateful memes, and proposes LSV, a steering vector method that significantly improves VLM performance by addressing the generalization gap.

View →

cs.CRRecentMay 12, 2026

A microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alerting

Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana, Valderi Reis Quietinho Leithardt

The paper proposes a unified, microservices-based platform that integrates endpoint telemetry and predictive NLP models to provide real-time, correlated alerting for security risks and hate speech.

View →

cs.CLcs.AIRecentMay 27, 2026

Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News

Alejandro Buitrago López, Alberto Ortega Pastor, Javier Pastor-Galindo, José A. Ruipérez-Valiente

The paper evaluates LLM-generated reactions to Spanish online news, finding that off-the-shelf models fail to accurately reproduce the measurable properties of real audience discourse, and even fine-t…

View →

cs.SDcs.AIcs.CRRecentMay 15, 2026

Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues

Zhongjie Ba, Liang Yi, Peng Cheng, Qingcao Li +2 more

The paper introduces ToxiAlert-Bench, a large-scale audio dataset that uniquely annotates both textual and paralinguistic sources of toxicity, and proposes a dual-head neural network that significantl…

View →

cs.CRcs.AIRecentMay 1, 2026

A Sentence Relation-Based Approach to Sanitizing Malicious Instructions

Soumil Datta, Melissa Umble, Daniel S. Brown, Guanhong Tao

The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…

View →

cs.LGcs.CLRecentMay 28, 2026

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more

The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.

View →

cs.CRRecentApr 25, 2026

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

ElMouatez Billah Karbab

AsmRAG is a novel framework that improves malware detection by treating it as an evidence-based retrieval task using a code-specialized LLM, achieving high accuracy while providing transparent forensi…

View →

cs.AIcs.CYcs.HCRecentMay 27, 2026

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

Aisha Najera, Alvin Moon, Vedant Srinivasan, Rajesh Veeraraghavan

The paper proposes an Interpretive Audit Pipeline to evaluate LLMs for public comment analysis, arguing that measuring inter-model disagreement is crucial because standard accuracy metrics fail to det…

View →

cs.CRRecentApr 13, 2026

A Synthetic Conversational Smishing Dataset for Social Engineering Detection

Carl Lochstampfor, Ayan Roy

The paper introduces a synthetic dataset of multi-round conversations to detect conversational smishing, finding that XGBoost with TF-IDF features achieved the best performance (72.5% accuracy).

View →

cs.CLcs.AIRecentMay 30, 2026

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

Shefayat E Shams Adib, Ahmed Alfey Sani, Md Hasibur Rahman Alif, Ajwad Abrar

The paper introduces LinguIUTics, a system that significantly improves the classification of rare psychological defense mechanisms in conversational text by fine-tuning Qwen3-8B using specialized imba…

View →

cs.SEcs.CRRecentApr 1, 2026

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar, Eray Tüzün +1 more

SERSEM introduces a selective entropy-weighted scoring framework to significantly improve Membership Inference Attacks (MIAs) against code LLMs by focusing on human-centric coding anomalies rather tha…

View →

cs.CRcs.AIcs.CYRecentMay 13, 2026

Identifying AI Web Scrapers Using Canary Tokens

Steven Seiden, Triss Ren, Caroline Zhang, Taein Kim +2 more

The paper proposes a novel, scalable technique using unique canary tokens to automatically and accurately identify which web scrapers are feeding data to specific Large Language Models (LLMs).

View →

cs.CRcs.AIRecentApr 6, 2026

SALLIE: Safeguarding Against Latent Language & Image Exploits

Guy Azov, Ofer Rivlin, Guy Shtar

SALLIE introduces a lightweight, modal-agnostic runtime detection framework that effectively safeguards LLMs and VLMs against both textual and visual jailbreaks and prompt injections without performan…

View →

cs.CRRecentMay 12, 2026

PhishSigma++: Malicious Email Detection with Typed Entity Relations

Shang Shang, Ruiqi Wang, Ruijie Qi, Hao Li +3 more

PhishSigma++ is a novel entity-relation-based detector that improves malicious email detection by focusing on invariant functional relationships between typed entities, significantly outperforming tex…

View →

cs.CRRecentApr 8, 2026

RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Ziye Wang, Guanyu Wang, Kailong Wang

RefineRAG introduces a novel word-level poisoning framework that significantly enhances knowledge poisoning attacks against RAG systems, achieving state-of-the-art effectiveness and transferability to…

View →

cs.CLRecentJun 1, 2026

On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective

Yikai Guo, Bin Wang, Xilai Fan, Wenjun Ke +1 more

The paper proposes 'Uncertainty,' a multiscale uncertainty estimator that focuses on low-probability tokens to improve the detection of AI-generated text by addressing boilerplate dominance and score…

View →

cs.CRRecentMay 8, 2026

When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation

Chaitanya Vilas Garware, Sharif Noor Zisad

The paper demonstrates that relying on strict regular-expression parsing for evaluating LLM-based security log classifiers introduces systematic errors, potentially causing a functional model to appea…

View →