Bo Li
20 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vulnerabilities in current and emerging segmentation models.
The paper proposes Functional Subspace Watermarking (FSW), a robust method that embeds ownership signals into a stable, low-dimensional functional subspace of LLMs, significantly improving detection accuracy against model modifications.
The paper introduces FHE-DiCSNN, a novel framework that uses the TFHE scheme to enable secure and efficient computation on Spiking Neural Networks (SNNs), achieving high accuracy and fast inference times.
This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust, and reliable real-world agents.
The paper proposes a comprehensive framework for LLM-based agent unlearning, enabling agents to selectively forget specific knowledge (states, trajectories, or environments) while maintaining performance and resisting knowledge inference by adversaries.
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality.
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and safety defenses are highly vulnerable to subtle, compositionally unsafe prompts.
The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attack success rates with minimal accuracy loss.
MARD introduces a multi-agent framework that combines Large Language Models (LLMs) with traditional static analysis engines to achieve robust and highly interpretable Android malware detection with low computational cost.
The paper introduces ML-Bench, a policy-grounded multilingual safety benchmark, and ML-Guard, a superior guardrail model that enables culturally and legally aligned safety assessment for LLMs across 14 languages.
The paper introduces Kumushi, a root-cause-driven patching agent that significantly improves automated vulnerability repair by focusing LLMs on the true source of bugs, outperforming existing methods and matching commercial agents.
The paper introduces TurnGate, a response-aware defense mechanism that detects the earliest turn in a multi-turn dialogue where the accumulated interaction enables a harmful action, significantly improving malicious intent detection.
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even against adaptive attacks.
The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when subjected to sustained adversarial questioning.
This paper introduces a framework to audit source-dependence in multi-source RAG systems, demonstrating that disagreement across institutional sources is a common and critical failure mode that current evaluation metrics overlook.
The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-rate with a novel iterative framework.
The paper introduces Loong, a novel human-like agent that significantly improves long document translation by adaptively selecting and utilizing optimal historical context using a specialized memory module and reinforcement learning.
The paper introduces a quotient-DAG view to accurately estimate unordered slate propensities for off-policy evaluation, solving the nuisance variance and computational gap inherent in standard importance sampling for autoregressive recommenders.
The paper introduces DrugClaw, a multi-agent system, and DrugAudit, a new benchmark, demonstrating that DrugClaw excels at answering drug-related questions by grounding answers in primary regulatory sources.
The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitigating mode collapse.
Papers
Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior
The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…