Richard J. Young
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more granular axis for evaluating AI safety.
This paper systematically reviews thirteen diverse malicious-code prompt corpora used to evaluate LLM refusal, identifying critical methodological gaps in current research.
The paper introduces a large, consensus-labeled prompt bank that reliably distinguishes between requests for executable malicious code and requests for harmful security knowledge, providing a standardized tool for measuring coding model compliance.
Papers
Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests
The paper introduces a large, consensus-labeled prompt bank that reliably distinguishes between requests for executable malicious code and requests for harmful security knowledge, providing a standard…