Sangyeon Yoon

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×1AI×1

Frequent co-authors

Wonje Jeung1×

Yoonjun Cho1×

Dongjae Jeon1×

Albert No1×

Research Timeline

2026

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs

The paper introduces a truly benign Direct Preference Optimization (DPO) attack that can jailbreak large language models (LLMs) by fine-tuning them with minimal, harmless preference data, thereby suppressing refusal behavior even for malicious prompts.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIRecentMay 9, 2026

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs

Sangyeon Yoon, Wonje Jeung, Yoonjun Cho, Dongjae Jeon +1 more

View →