ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.15047v1· 20 results

cs.CYcs.CRRecentMay 6, 2026

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman

This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…

View →
cs.CRcs.CYcs.LGRecentApr 11, 2026

"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games

Jonas Ave, Irdin Pekaric, Matthias Frohner, Giovanni Apruzzese

This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…

View →
cs.CRRecentJun 4, 2026

Cheating in Multiplayer Online Games: a Dataset

Hugo Bertin, Marc Dacier, Yérom-David Bromberg

This paper introduces a novel, comprehensive dataset that logs various cheating activities, including difficult-to-detect network flow disruption cheats, for the purpose of developing robust detection…

View →
cs.CRcs.AIcs.CLRecentMay 12, 2026

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more

The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…

View →
cs.CRcs.AIcs.CLRecentApr 3, 2026

An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more

The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…

View →
cs.CRcs.CYcs.LGRecentMay 7, 2026

Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation

Florian A. D. Burnat, Brittany I. Davidson

The paper demonstrates that current safety audit metrics are susceptible to strategic platform manipulation, proposing a more robust 'semantic-envelope' metric that better certifies genuine harm reduc…

View →
cs.CRcs.SERecentMay 4, 2026

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Richard J. Young, Gregory D. Moody

The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…

View →
cs.HCcs.CRRecentMay 22, 2026

From Preventive to Reactive: How AI Coding Assistants Transform Developers' Security Awareness

Faisal Haque Bappy, Tahrim Hossain, Sidratul Muntaher Meheraj, Annoor Sharara Akhand +4 more

The paper investigates how AI coding assistants shift developers' security focus from proactive prevention to reactive review, finding that this structural change is reinforced by current tool interac…

View →
cs.CRRecentMay 23, 2026

Reframing LLM Agent Security as an Agent-Human Interaction Problem

Peiran Wang, Ying Li, Yuan Tian

The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…

View →
cs.CRRecentMay 15, 2026

From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI

Zelin Zhang, Qi Li, Jie Cao, Lingshuang Liu +1 more

The paper analyzes the escalating security and safety threats posed by generative AI systems as they transition from merely generating content to executing real-world actions via tools and agents, fin…

View →
cs.CRRecentJun 4, 2026

Exploring the connection between coding habits and cognitive styles in malware developers

Vasilis Vouvoutsis, Constantinos Patsakis, Fran Casino

The study analyzes coding patterns in malware versus benign software, finding that malware code is optimized for quick evasion and secrecy rather than maintainability, though its metrics are not uniqu…

View →
cs.CRRecentMay 29, 2026

A Moderatorless Protocol for WEREWOLF

Naoki Kitamura, Hironori Kiya, Hirotaka Ono

The paper presents a complete, moderatorless protocol for playing Werewolf using only ordinary playing cards, eliminating the need for a trusted third party or digital devices.

View →
cs.CLcs.AIcs.HCRecentMay 28, 2026

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more

The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…

View →
cs.CRRecentMar 25, 2026

An Empirical Analysis of Google Play Data Safety Disclosures: A Consistency Study of Privacy Indicators in Mobile Gaming Apps

Bakheet Aljedaani

This study empirically analyzed 41 mobile gaming apps, finding that while device ID disclosures were relatively consistent, location and personal information disclosures showed significant mismatches…

View →
cs.CRcs.GTRecentMay 11, 2026

Cybercrime and Prevention: Colonel Blotto in Social Engineering

Gergely Benkő, Katalin Parti, Gergely Biczók

This paper uses Colonel Blotto game models, grounded in Routine Activity Theory, to determine the optimal allocation of defensive resources against social engineering attacks, providing data-driven de…

View →
cs.CRcs.AIcs.CVRecentMay 11, 2026

BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data

Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Guramrit Singh +2 more

The paper introduces BEACON, a large-scale, multimodal dataset capturing diverse behavioral signals from competitive Valorant gameplay, designed for rigorous testing of continuous authentication and b…

View →
cs.CRcs.AIcs.CLRecentApr 20, 2026

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj

The paper investigates how different methods of jailbreaking large language models (SFT, RLVR, and abliteration) lead to vastly different behavioral and mechanistic failures, even when all methods ach…

View →
cs.AIcs.CYq-fin.RMRecentMay 27, 2026

The Ethics of LLM Sandbox and Persona Dynamics

Tim Gebbie, Stewart Gebbie

The paper argues that LLM guardrails and persona dynamics create an unethical 'reality gap' by laundering epistemic risk onto users, advocating for task-level causal requirements over response-level m…

View →
cs.CRcs.AIRecentApr 12, 2026

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more

The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…

View →
cs.CRcs.AIRecentApr 16, 2026

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen +1 more

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM ref…

View →