ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2604.27238v1· 20 results

cs.CRcs.LGRecentMay 28, 2026

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann

The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.

View →
cs.CRRecentApr 18, 2026

HarmChip: Evaluating Hardware Security Centric LLM Safety via Jailbreak Benchmarking

Zeng Wang, Minghao Shao, Weimin Fu, Prithwish Basu Roy +5 more

The paper introduces HarmChip, a novel benchmark to evaluate LLM vulnerability to domain-specific hardware security threats, revealing that current safety guardrails fail against semantically disguise…

View →
cs.ARcs.AIcs.CRRecentApr 15, 2026

VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog

Prithwish Basu Roy, Zeng Wang, Anatolii Chuvashlov, Weihua Xiao +3 more

VeriCWEty proposes an embedding-based framework to detect and classify common software vulnerabilities (CWEs) in Verilog RTL code at both module and line levels, achieving high detection accuracy.

View →
cs.CRcs.ARcs.LGRecentMay 11, 2026

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

This review analyzes the dual impact of integrating Large Language Models (LLMs) into hardware design, detailing both their transformative potential in EDA and the critical security vulnerabilities th…

View →
cs.AIRecentMay 28, 2026

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Zhihao Liu, Yifan Wu, Jian Lou, Di Wang +2 more

The paper proposes a novel zeroth-order optimization framework to enhance the robustness of LLM safety alignment, showing that few refinement steps can significantly improve safety while maintaining u…

View →
cs.CRRecentApr 8, 2026

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia, Hong Hu

The paper introduces TRUSTDESC, a novel framework that prevents tool poisoning attacks in LLM applications by automatically generating highly accurate and trusted tool descriptions directly from the t…

View →
cs.CRcs.AIcs.LGRecentMay 24, 2026

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya

This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…

View →
cs.CRcs.LGRecentApr 17, 2026

SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

Noor Islam S. Mohammad, Uluğ Bayazıt

SafeLM is a comprehensive framework that jointly addresses privacy, security, misinformation, and adversarial robustness in federated LLMs, achieving high safety performance while significantly reduci…

View →
cs.CRcs.PLcs.SERecentApr 28, 2026

Symbolic Execution Meets Multi-LLM Orchestration: Detecting Memory Vulnerabilities in Incomplete Rust CVE Snippets

Zeyad Abdelrazek, Young Lee

The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…

View →
cs.CRRecentApr 1, 2026

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou, Yuxi Li +2 more

The paper introduces TrojanMerge, a framework demonstrating that model merging can be exploited to systematically compromise the safety alignment of multiple individually safe LLMs.

View →
cs.CRcs.CVRecentApr 17, 2026

TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more

This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…

View →
cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →
cs.DCcs.AIRecentJun 1, 2026

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

Yafan Huang, Sheng Di, Guanpeng Li

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…

View →
cs.CRcs.SERecentMar 19, 2026

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu +3 more

The paper introduces Cross-Model Neuron Transfer (CNT), a post-hoc method that efficiently transfers safety-oriented functionalities between different large language models by transferring minimal sub…

View →
cs.PLcs.ARcs.LGRecentJun 4, 2026

CASS-RTL: Correctness-Aware Subspace Steering for RTL Generation with LLMs

Mohammad Akyash, Nowfel Mashnoor, Kimia Azar, Hadi Kamali

The paper introduces CASS-RTL, a novel, model-agnostic framework that enhances the functional correctness of Large Language Models (LLMs) generating Register-Transfer Level (RTL) code by leveraging th…

View →
cs.SEcs.CRRecentMar 18, 2026

Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety

Xuan Chen, Lu Yan, Ruqi Zhang, Xiangyu Zhang

The paper introduces SafeAudit, a meta-audit framework that systematically enumerates test cases and uses a quantitative metric to uncover significant residual unsafe behaviors in LLM agents that exis…

View →
cs.ARRecentMay 27, 2026

FT-Pilot: Automated Fault-Tolerant RTL Rewriting via Vulnerability-Guided LLMs

Weixing Liu, Zizhen Liu, Jing Ye, Naixing Wang +3 more

FT-Pilot is a novel GNN-guided LLM framework that automatically rewrites RTL code to harden digital circuits against soft errors, providing an efficient, automated path for reliability optimization.

View →
cs.SEcs.CRRecentMay 27, 2026

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more

The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…

View →
cs.CRcs.LGRecentApr 22, 2026

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya +5 more

The paper conducts an interpretability-driven safety audit of eight state-of-the-art LLMs, demonstrating that while interpretability-based steering is a powerful auditing tool, model robustness varies…

View →
cs.CRcs.CLRecentApr 9, 2026

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen +5 more

The paper investigates how various fine-tuning methods can be used both to intentionally misalign and subsequently realign large language models (LLMs), revealing distinct strengths for attack and def…

View →