Ruoxi Jia

3 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×3AI×2ML×1NLP×1

Frequent co-authors

Mahavir Dabas1×

Jihyun Jeong1×

Ming Jin1×

Jinhu Qi1×

Muzhi Li1×

Jiahong Liu1×

Research Timeline

2026

Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

Memory-Induced Tool-Drift in LLM Agents

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases are irrelevant to the task.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.LGRecentMay 24, 2026

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

View →

cs.AIcs.CLcs.CRRecentMay 17, 2026