Jiacheng Li
6 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial prompts, leading to enhanced safety robustness.
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step interactions.
The paper introduces LLMSurgeon, a framework that estimates the domain-level data mixture of a Large Language Model (LLM) using only generated text, thereby providing a post-hoc method to audit the model's 'digital DNA'.
The paper proposes VLBM, a latent basis modeling framework, to achieve state-of-the-art robustness in multivariate time series forecasting, particularly when facing rare but high-impact out-of-distribution (OOD) events.
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.
The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.
Papers
Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection
The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.