Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Ruoxi Jia

Ruoxi Jia

3 indexed papers

Recent (6 mo)
3
With code
0
Influential cites
0
Benchmarked
0

Publications per year

3
26

Top categories

Crypto×3AI×2ML×1NLP×1

Frequent co-authors

Mahavir Dabas1×
Jihyun Jeong1×
Ming Jin1×
Jinhu Qi1×
Muzhi Li1×
Jiahong Liu1×

Research Timeline

2026
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

Memory-Induced Tool-Drift in LLM Agents

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases are irrelevant to the task.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.LGRecentMay 24, 2026

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…

View →
cs.AIcs.CLcs.CRRecentMay 17, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

View →
cs.CRcs.AIRecentMay 8, 2026

Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

Kejia Chen, Jiawen Zhang, Boheng Li, Pengcheng Li +5 more

The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.

View →