Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Alina Fastowski

Alina Fastowski

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

Crypto×1NLP×1

Frequent co-authors

Yuxiao Li1×
Efstratios Zaradoukas1×
Bardh Prenkaj1×
Gjergji Kasneci1×

Research Timeline

2026
Analysing the Safety Pitfalls of Steering Vectors

This paper systematically audits the safety implications of activation steering vectors, finding that these vectors significantly influence the success rate of jailbreak attacks by overlapping with latent refusal directions.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.CLRecentMar 25, 2026

Analysing the Safety Pitfalls of Steering Vectors

Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas, Bardh Prenkaj +1 more

This paper systematically audits the safety implications of activation steering vectors, finding that these vectors significantly influence the success rate of jailbreak attacks by overlapping with la…

View →