John T. Halloran

3 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×3AI×3ML×3

Frequent co-authors

Noopur S. Bhatt1×

Research Timeline

2026

Understanding the Effects of Safety Unalignment on Large Language Models

This study compares two methods of safety unalignment (Jailbreak-Tuning and Weight Orthogonalization) across six LLMs and finds that Weight Orthogonalization (WO) significantly enhances malicious capabilities, making it a greater risk than Jailbreak-Tuning (JT).

Leveraging RAG for Training-Free Alignment of LLMs

The paper introduces RAG-Pref, a novel, training-free Retrieval Augmented Generation (RAG) method for preference alignment that significantly improves LLM refusal guardrails against agentic attacks with minimal computational overhead.

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

The paper proposes Open-Book Benign Rewriting (OBBR), a novel defense mechanism that uses LLM rewriting with benign samples to neutralize data poisoning attacks against LLMs, significantly improving safety performance.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIcs.LGRecentMay 18, 2026

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

John T. Halloran, Noopur S. Bhatt

View →

cs.LGcs.AIcs.CRRecentMay 11, 2026