Gjergji Kasneci

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×2ML×1Crypto×1

Frequent co-authors

Zheyu Zhang1×

Shuo Yang1×

Yuxiao Li1×

Alina Fastowski1×

Efstratios Zaradoukas1×

Bardh Prenkaj1×

Research Timeline

2026

Analysing the Safety Pitfalls of Steering Vectors

This paper systematically audits the safety implications of activation steering vectors, finding that these vectors significantly influence the success rate of jailbreak attacks by overlapping with latent refusal directions.

Consolidating Rewarded Perturbations for LLM Post-Training

The paper introduces CoRP, a gradient-free operator that consolidates the benefits of ensemble-based post-training methods into a single, deployable model update, significantly improving performance with minimal computational overhead.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.LGRecentMay 29, 2026

Consolidating Rewarded Perturbations for LLM Post-Training

Zheyu Zhang, Shuo Yang, Gjergji Kasneci

View →

cs.CRcs.CLRecentMar 25, 2026