Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Yujiu Yang

Yujiu Yang

5 indexed papers

Recent (6 mo)
5
With code
0
Influential cites
0
Benchmarked
0

Publications per year

5
26

Top categories

NLP×4AI×4ML×3Vision×2

Frequent co-authors

Junjie Wang2×
Chufan Shi2×
Yusong Zhao1×
Yuejin Xie1×
Youliang Yuan1×
Junjie Hu1×

Research Timeline

2026
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

The paper introduces OmniVerifier-M1, a multimodal meta-verifier that uses symbolic outputs and decoupled reinforcement learning to provide robust, fine-grained verification and error localization for large multimodal models.

Integrated and Cross-Architecture Interpretation of LLM Reasoning

The paper introduces an Integrated, cross-Architecture Reasoning (IAR) framework to provide a unified and robust method for interpreting the opaque reasoning processes within Large Language Models.

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during training.

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models struggle with temporal precision and high false-positive rates.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIcs.CVRecentJun 1, 2026

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…

View →
cs.CLcs.LGRecentMay 30, 2026

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun +2 more

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in r…

View →
cs.LGcs.AIRecentMay 29, 2026

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…

View →
cs.CLcs.AIcs.CVRecentMay 27, 2026

OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration

Xinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi +6 more

The paper introduces OmniVerifier-M1, a multimodal meta-verifier that uses symbolic outputs and decoupled reinforcement learning to provide robust, fine-grained verification and error localization for…

View →
cs.CLcs.AIRecentMay 27, 2026

Integrated and Cross-Architecture Interpretation of LLM Reasoning

Leonardo Matthew Yauw, Wei-Bin Kou, Yujiu Yang

The paper introduces an Integrated, cross-Architecture Reasoning (IAR) framework to provide a unified and robust method for interpreting the opaque reasoning processes within Large Language Models.

View →