Xiaodong Li
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes DAMPER, a domain-aware framework that autonomously extracts and rewrites private information from text while providing rigorous differential privacy guarantees, significantly improving the privacy-utility trade-off.
The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly fewer evaluations.
Papers
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models
The paper introduces a framework using the 'behavioral geometry' of model populations to efficiently predict jailbreak susceptibility and transfer defenses, achieving high accuracy with significantly…