Xin Li
29 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces the concept of 'Sleeper Attack,' demonstrating that adversarial content can persist across multiple interactions with an LLM agent, posing a more subtle and difficult-to-detect safety threat than single-interaction attacks.
MACReD introduces a hierarchical multi-agent framework that achieves state-of-the-art performance in parsing complex chemical reaction diagrams by coordinating specialized agents for perception and global reasoning.
The paper proposes a Cycle-Space Detector (CSD) that uses network topology constraints to effectively detect stealthy, data-driven False Data Injection Attacks (FDIA) that exploit the null space of measurement data.
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.
CRITIC-R1 introduces a structured critic framework that treats RAG critique as an explicit error diagnosis problem using reinforcement learning, significantly improving answer quality over strong RAG baselines.
The paper introduces polynomial representations as a quantitative, distribution-aware metric for measuring model simplicity, demonstrating that the effective degree of this representation is a superior predictor of generalization compared to existing proxies.
AliMark proposes a novel watermarking framework that treats sentence-level watermarking as a bit sequence alignment problem, significantly enhancing robustness against structural text perturbations like sentence splitting and merging.
AliMark proposes a novel framework that enhances the robustness of sentence-level watermarking by reformulating the problem as a bit sequence encoding and alignment task, significantly improving resilience against structural text perturbations like sentence splitting and merging.
The paper proposes the Morlet Spectral Transformer (MST), a novel architecture that effectively decodes cross-subject emotion from EEG by designing specialized spectral and spatial representations, outperforming existing large foundation models.
The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temporal fidelity.
InfoMerge is a novel, training-free method that significantly compresses visual tokens for Video-LLMs by estimating temporal redundancy and allocating tokens based on content richness, achieving high efficiency with minimal performance loss.
The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical safety issue for autonomous LLM agents.
The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on complex QA tasks.
The paper proposes Joint Neighborhood Optimization (JNO), a novel knowledge-editing framework that jointly addresses the coupled pressures of desirable knowledge propagation and unintended knowledge leakage during single-edit updates in LLMs.
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.
The paper analyzes information-sharing mechanisms in oligopolies, finding that privacy protection alone is insufficient to incentivize suppliers to share data; successful sharing requires combining privacy safeguards with a sufficiently informative external signal.
QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in RL performance.
MLEvolve is a novel self-evolving multi-agent framework that enables LLM agents to discover and optimize machine learning algorithms for complex, long-horizon tasks.
The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.
Papers
CORE-Bench: A Comprehensive Benchmark for Code Retrieval in the Era of Agentic Coding
Fuwei Zhang, Yanzhao Zhang, Mingxin Li, Dingkun Long +4 more
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.