Tao Li
9 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes ADAM, a novel and highly effective privacy attack that systematically extracts sensitive data from LLM agent memory by adaptively querying the victim agent's memory based on data distribution and entropy.
The paper proposes GREW, a novel Green-REd Watermarking framework that embeds ownership signals into recommender systems' intrinsic ranking process without requiring synthetic data, achieving robust protection against model extraction attacks.
STARE introduces a novel hierarchical reinforcement learning framework that treats the entire image generation process (denoising trajectory) as an attack surface, significantly improving the detection of multi-modal toxicity vulnerabilities in Vision-Language Models.
This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defenses are often ineffective.
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly model-dependent and defenses rarely generalize across different phases.
LoRe is a training-free wrapper that dynamically budgets interaction evaluation at each step of graph solvers, significantly improving scalability and speed while maintaining solution quality.
Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic tasks and embodiments.
The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint optimization.
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in current LLM judging methods.
Papers
Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation
Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…