Robust Safety Monitoring of Language Models via Activation Watermarking | ArxivCSExplorer