Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Xinyue Shen

Xinyue Shen

3 indexed papers

Recent (6 mo)
3
With code
0
Influential cites
0
Benchmarked
0

Publications per year

3
26

Top categories

Crypto×3AI×1NLP×1

Frequent co-authors

Michael Backes3×
Yang Zhang3×
Zeyuan Chen1×
Yihan Ma1×
Yukun Jiang1×
Yage Zhang1×

Research Timeline

2026
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and defense mechanisms.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM refusal rates when integrated into agent workflows.

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

The PopQuiz Attack is a novel black-box membership inference attack that successfully tests whether large language models memorize specific training data by framing the target data as multiple-choice quiz questions.

Highlighted terms show continued research focus across papers

Papers

cs.CRRecentMay 7, 2026

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

Zeyuan Chen, Yihan Ma, Xinyue Shen, Michael Backes +1 more

The PopQuiz Attack is a novel black-box membership inference attack that successfully tests whether large language models memorize specific training data by framing the target data as multiple-choice…

View →
cs.CRcs.AIRecentApr 16, 2026

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen +1 more

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM ref…

View →
cs.CRcs.CLRecentApr 9, 2026

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen +5 more

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and def…

View →