Liang Li
10 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.
The paper introduces ProjLens, an interpretability framework that reveals that backdoor vulnerabilities in Multimodal Large Language Models (MLLMs) are encoded within a low-rank subspace of the projector, causing a measurable semantic shift in poisoned inputs.
DCVD proposes a dual-channel cross-modal fusion framework that jointly detects software vulnerabilities and precisely localizes the vulnerable lines, outperforming existing state-of-the-art methods.
The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revealing a high rate of behavioral deviation.
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defenses are often ineffective.
The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelligently managing and pruning accumulated context.
This paper introduces a novel attack, RA-ICA, that targets RAG-enhanced LLMs by poisoning external knowledge bases to drastically increase inference costs, achieving up to a 13.12x increase in token consumption.
THRD introduces a novel, training-free framework that models temporal risk accumulation to effectively defend against multi-turn jailbreak attacks on LLMs, significantly reducing attack success rates while maintaining model utility.
The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target localization from modification execution.
Papers
THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models
Zhiqing Ma, Zhonghao Xu, Dong Yu, Chen Kang +2 more
THRD introduces a novel, training-free framework that models temporal risk accumulation to effectively defend against multi-turn jailbreak attacks on LLMs, significantly reducing attack success rates…