Prashant Kulkarni
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse and disinformation, and recommends mandatory safety testing for future open-weight models.
The paper introduces 'adversarial restlessness,' an activation-level signature in LLM residual streams, to detect multi-turn prompt injection attacks with high accuracy.
Papers
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
The paper introduces 'adversarial restlessness,' an activation-level signature in LLM residual streams, to detect multi-turn prompt injection attacks with high accuracy.