Qi Li
34 indexed papers
Research Timeline
The paper introduces AgentWard, a lifecycle-oriented, defense-in-depth architecture designed to systematically secure autonomous AI agents by protecting them across all stages of their operation.
MelShield is a robust, in-generation audio watermarking framework that embeds identifiable signals into AI-generated speech in the Mel-spectrogram domain for reliable copyright protection and attribution.
The paper introduces PhishEye, a fully dynamic self-supervised system that models Ethereum transactions as a heterogeneous temporal attributed multi-graph and uses temporal graph contrastive learning to achieve high accuracy in detecting phishing activities.
AFL-ICP is a novel specification-driven fuzzing framework that significantly enhances the security testing of industrial control protocols by detecting subtle semantic and logic bugs missed by traditional fuzzers.
The paper analyzes the escalating security and safety threats posed by generative AI systems as they transition from merely generating content to executing real-world actions via tools and agents, finding that current defenses lag behind capability deployment.
The paper introduces EgoBench, the first interactive multimodal benchmark designed to jointly evaluate advanced AI agents' capabilities in visual perception, multi-hop reasoning, and dynamic tool usage in real-world, egocentric scenarios.
DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving superior content depth and reliability compared to existing methods.
Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.
The paper introduces PassNet, a large-scale ecosystem for generating compiler passes using LLMs, demonstrating that LLMs can significantly accelerate graph compilation for long-tail workloads, suggesting that consistency is the primary bottleneck.
The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent Reinforcement Learning (RL) performance.
The paper introduces the Configurable Safety Reward Model (CSRM), a novel reward model that can be jointly optimized for calibrated safety compliance and reward modeling, significantly improving LLM safety alignment across diverse and unseen safety configurations.
The paper introduces SURE, a unified framework designed to standardize and improve the comparability and reproducibility of evaluations for advanced speech understanding models.
The paper introduces SAVE, a framework that uses on-policy feedback and the value function to self-supervise and improve reward models, significantly enhancing RLHF performance across multiple benchmarks.
The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechanisms.
The paper demonstrates that specialized coding agents, using only text and image access within a sandbox, can effectively solve complex omnimodal tasks, often outperforming state-of-the-art native omnimodal models.
The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgetting and enhancing robustness.
The paper proposes Self-Adaptive Monotonic Normalization (SAMN), a hyperparameter-friendly method that improves long-tailed recognition by enforcing monotonicity on per-class weight norms without requiring parameter regularization.
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validation and submission.
TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.
The paper introduces RedEdit, an agentic red-teaming framework that demonstrates that malicious images can be easily edited to bypass safety classifiers while retaining their harmful semantics.
Papers
TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu +3 more
TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.