Wei Yang Bryan Lim
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes SALO, a novel detector that monitors the dynamic, layer-wise activation pattern (Refusal Trajectory) to improve jailbreak detection robustness compared to traditional methods relying on static terminal representations.
The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-damage detection.
Papers
FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence
Xinyu Yan, Boyang Chen, Jiaming Zhang, Tiantong Wu +11 more
The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-da…