Jie Chen

15 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×11NLP×6ML×4Crypto×4Vision×3Info Retrieval×2Software Eng.×2Multimedia×1

Frequent co-authors

Hanjie Chen3×

Kaijie Chen3×

Junjie Chen2×

Zilin Xiao1×

Qi Ma1×

Chun-cheng Jason Chen1×

Research Timeline

2026

TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

TEMPLATEFUZZ is a fine-grained fuzzing framework that systematically tests chat templates to find vulnerabilities in LLMs, achieving high jailbreak success rates with minimal performance degradation.

CSC: Turning the Adversary's Poison against Itself

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attack success rates with minimal accuracy loss.

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

MemMark introduces a state-evolution attribution watermark that embeds owner-controlled signals into latent memory-write decisions, enabling robust provenance tracking for agent memory even when all traditional logs and metadata are lost.

GUI Agents for Continual Game Generation

The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-rate with a novel iterative framework.

Multimodal Music Recommendation System using LLMs

The paper proposes a novel multimodal framework for session-based music recommendation that jointly models audio, lyric, and semantic content signals within a unified LLM-based sequential reasoning system.

Conformal Certification of Reasoning Trace Prefixes

The paper introduces CROP, a novel conformal procedure that provides rigorous statistical guarantees for certifying the longest safe prefix of a language model's reasoning trace, allowing for targeted error identification and repair.

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling transformer.

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing turn-level methods.

FlowTime: Towards Continuous Generative Watch Time Prediction via Flow-based Personalized Priors

FlowTime proposes a novel Continuous Generative Regression framework using a Flow-based Personalized Prior to accurately model the multimodal and heterogeneous nature of user watch time prediction, significantly outperforming existing state-of-the-art methods.

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto large shared foundation models.

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

STaR-KV introduces a novel, training-free KV cache compression framework that adaptively re-weights token importance across spatial, temporal, and distributional axes, significantly reducing GPU memory usage for GUI vision-language models while maintaining high accuracy.

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in current LLM judging methods.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIEmpiricalRecentJun 11, 2026

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen +3 more

This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.

View →

cs.IRcs.AIcs.CLRecent