Jianwei Li
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper argues that the 'positive backdoor' label should be retired and replaced with 'Secret Alignment,' asserting that all such protective claims require rigorous, standardized evaluation due to inherent brittleness.
The paper argues that the 'positive backdoor' label should be retired and replaced with 'Secret Alignment,' asserting that such protective claims must be rigorously evaluated for security, especially concerning confidentiality, integrity, and availability.
Papers
Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation
The paper argues that the 'positive backdoor' label should be retired and replaced with 'Secret Alignment,' asserting that all such protective claims require rigorous, standardized evaluation due to i…