Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Jie Chen

Jie Chen

15 indexed papers

Recent (6 mo)
15
With code
0
Influential cites
0
Benchmarked
0

Publications per year

15
26

Top categories

AI×11NLP×6ML×4Crypto×4Vision×3Info Retrieval×2Software Eng.×2Multimedia×1

Frequent co-authors

Hanjie Chen3×
Kaijie Chen3×
Junjie Chen2×
Zilin Xiao1×
Qi Ma1×
Chun-cheng Jason Chen1×

Research Timeline

2026
TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

TEMPLATEFUZZ is a fine-grained fuzzing framework that systematically tests chat templates to find vulnerabilities in LLMs, achieving high jailbreak success rates with minimal performance degradation.

CSC: Turning the Adversary's Poison against Itself

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attack success rates with minimal accuracy loss.

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

MemMark introduces a state-evolution attribution watermark that embeds owner-controlled signals into latent memory-write decisions, enabling robust provenance tracking for agent memory even when all traditional logs and metadata are lost.

GUI Agents for Continual Game Generation

The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-rate with a novel iterative framework.

Multimodal Music Recommendation System using LLMs

The paper proposes a novel multimodal framework for session-based music recommendation that jointly models audio, lyric, and semantic content signals within a unified LLM-based sequential reasoning system.

Conformal Certification of Reasoning Trace Prefixes

The paper introduces CROP, a novel conformal procedure that provides rigorous statistical guarantees for certifying the longest safe prefix of a language model's reasoning trace, allowing for targeted error identification and repair.

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling transformer.

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing turn-level methods.

FlowTime: Towards Continuous Generative Watch Time Prediction via Flow-based Personalized Priors

FlowTime proposes a novel Continuous Generative Regression framework using a Flow-based Personalized Prior to accurately model the multimodal and heterogeneous nature of user watch time prediction, significantly outperforming existing state-of-the-art methods.

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto large shared foundation models.

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

STaR-KV introduces a novel, training-free KV cache compression framework that adaptively re-weights token importance across spatial, temporal, and distributional axes, significantly reducing GPU memory usage for GUI vision-language models while maintaining high accuracy.

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in current LLM judging methods.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIEmpiricalRecentJun 11, 2026

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen +3 more

This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.

View →
cs.IRcs.AIcs.CLRecent
Jun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →
cs.LGcs.CLRecentJun 1, 2026

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Mind Lab, :, Song Cao, Vic Cao +51 more

The paper reframes Parameter-Efficient Fine-Tuning (PEFT) from a mere cost-saving alternative to a robust architecture for creating persistent, personalized models that layer specific behaviors onto l…

View →
cs.CVcs.AIRecentJun 1, 2026

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

Yuhang Han, Wenzheng Yang, Yujie Chen, Xiangqi Jin +3 more

STaR-KV introduces a novel, training-free KV cache compression framework that adaptively re-weights token importance across spatial, temporal, and distributional axes, significantly reducing GPU memor…

View →
cs.CLRecentJun 1, 2026

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more

The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…

View →
cs.AIRecentMay 31, 2026

FlowTime: Towards Continuous Generative Watch Time Prediction via Flow-based Personalized Priors

Hongxu Ma, Han Zhou, Chenghou Jin, Jie Zhang +4 more

FlowTime proposes a novel Continuous Generative Regression framework using a Flow-based Personalized Prior to accurately model the multimodal and heterogeneous nature of user watch time prediction, si…

View →
cs.IRcs.AIcs.LGRecentMay 28, 2026

Multimodal Music Recommendation System using LLMs

Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan +5 more

The paper proposes a novel multimodal framework for session-based music recommendation that jointly models audio, lyric, and semantic content signals within a unified LLM-based sequential reasoning sy…

View →
cs.AIcs.CLcs.LGRecentMay 28, 2026

Conformal Certification of Reasoning Trace Prefixes

Matt Y. Cheung, Ashok Veeraraghavan, Hanjie Chen, Guha Balakrishnan

The paper introduces CROP, a novel conformal procedure that provides rigorous statistical guarantees for certifying the longest safe prefix of a language model's reasoning trace, allowing for targeted…

View →
cs.CVcs.AIRecentMay 28, 2026

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…

View →
cs.CLcs.AIRecentMay 28, 2026

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

Rongsheng Zhang, Jiji Tang, Junnan Ren, Zuyi Bao +5 more

The paper introduces DynSess, a novel session-level framework that evaluates and optimizes role-playing agents by assessing long-horizon conversational quality, significantly outperforming existing tu…

View →
cs.SEcs.AIcs.CVRecentMay 27, 2026

GUI Agents for Continual Game Generation

Yixu Huang, Bo Li, Na Li, Zhe Wang +7 more

The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-…

View →
cs.CRRecentMay 24, 2026

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

Haobo Zhang, Xutao Mao, Guangyuan Dong, Ziwei Li +4 more

MemMark introduces a state-evolution attribution watermark that embeds owner-controlled signals into latent memory-write decisions, enabling robust provenance tracking for agent memory even when all t…

View →
cs.CRcs.LGcs.MARecentMay 1, 2026

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Lingxi Zhang, Guangtao Zheng, Hanjie Chen

This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.

View →
cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →
cs.CRcs.AIcs.SERecentApr 14, 2026

TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

Qingchao Shen, Zibo Xiao, Lili Huang, Enwei Hu +2 more

TEMPLATEFUZZ is a fine-grained fuzzing framework that systematically tests chat templates to find vulnerabilities in LLMs, achieving high jailbreak success rates with minimal performance degradation.

View →