Rui Yan

9 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×8NLP×4ML×3Crypto×3Vision×2Info Retrieval×1Software Eng.×1Multimedia×1

Frequent co-authors

Rui Yang4×

Yuxi Chen2×

Hao Bai2×

Huan Zhang2×

Tong Zhang2×

Qianhui Wu1×

Research Timeline

2026

CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training

The paper introduces ReCAP, a native GUI agent that significantly improves CAPTCHA solving success (from 30% to 80%) by integrating specialized CAPTCHA capabilities into a general-purpose, end-to-end vision-language model.

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment methods.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

This paper provides the first comprehensive systematization and large-scale empirical evaluation of existing LLM-based Automated Penetration Testing (AutoPT) frameworks, offering a structured taxonomy and unified benchmark for the field.

PRO-CUA: Process-Reward Optimization for Computer Use Agents

PRO-CUA introduces a process-reward optimization framework that enables efficient, step-level reinforcement learning for training computer use agents by decoupling environment interaction from policy optimization.

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.

PhoneWorld: Scaling Phone-Use Agent Environments

The paper introduces PhoneWorld, a scalable pipeline that automatically converts real-world GUI trajectories and screenshots into controllable, reproducible phone-use environments, significantly improving agent performance across multiple mobile benchmarks.

LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation

LoopFM proposes a novel framework to significantly improve knowledge distillation for recommendation systems by structuring the rich intermediate embeddings of large foundation models as input features, thereby overcoming the limitations of single-scalar prediction transfer.

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit semantic context and reducing reliance on layout-dependent serializations.

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performance on challenging web benchmarks.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CLRecentJun 1, 2026

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more

View →

cs.CLRecentMay 29, 2026