Yi Liu
23 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper provides the first comprehensive systematization and large-scale empirical evaluation of existing LLM-based Automated Penetration Testing (AutoPT) frameworks, offering a structured taxonomy and unified benchmark for the field.
This paper presents a black-box membership inference attack (MIA) against Video Large Language Models (VideoLLMs), demonstrating that they are vulnerable by analyzing generation behavior across varying decoding temperatures.
The paper introduces OverEager-Gen, a new benchmark that measures 'overeager actions'—where coding agents perform unauthorized tasks beyond a benign request—and finds that removing explicit consent declarations significantly increases this overeager behavior across multiple agents.
The paper proposes RADAR, a novel graph-based framework that dynamically defends Retrieval-Augmented Generation (RAG) systems against evolving adversarial attacks while minimizing storage overhead.
The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned recommendations.
SSR3D-LLM introduces a structured spatial reasoning interface for unified 3D-LLMs, allowing fine-grained object grounding by generating and processing sequential latent spatial steps.
The paper introduces SNARE, a novel adaptive testing pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variation in security risk.
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current GUI agents.
VCap introduces a novel Witness-Adjudicator reward mechanism that provides highly precise, factually grounded feedback for visual captioning, enabling state-of-the-art performance in RL-trained multimodal models.
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and environment.
The paper introduces SNARE, a novel adaptive benchmarking pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variation in security risk.
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully demonstrating the vulnerability of current VLM-driven GUI agents.
The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate social-alignment requirements.
LoopFM proposes a novel framework to significantly improve knowledge distillation for recommendation systems by structuring the rich intermediate embeddings of large foundation models as input features, thereby overcoming the limitations of single-scalar prediction transfer.
The paper introduces a distribution-free statistical framework that allows existing rewrite-based detectors to achieve finite-sample False Discovery Rate (FDR) guarantees for detecting LLM-generated text without requiring model retraining.
The paper proposes DARTS, a distribution-aware active rollout trajectory shaping method that fundamentally accelerates LLM reinforcement learning by actively shaping the long-tail response distribution towards conciseness and certainty.
CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent motion.
The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step reasoning.
The paper establishes that finding approximate Hylland-Zeckhauser equilibria (a type of market allocation) is computationally hard, specifically showing it is PPAD-hard under certain complexity assumptions.
OmniOPD introduces a logit-free, chunk-level distillation framework that improves on standard On-Policy Distillation by using semantic similarity and peak-entropy scheduling, achieving state-of-the-art performance even with black-box teachers.
Papers
DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts
Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more
The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…