Rui Song

3 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×2Multimedia×2NLP×1Vision×1ML×1Crypto×1

Frequent co-authors

Xinhao Song1×

Su Su1×

Sirui Song1×

Hongliang Wu1×

Wen Shen1×

Zhihua Wei1×

Research Timeline

2026

AgenticVBench: Can AI Agents Complete Real-World Post-Production Tasks?

The paper introduces AgenticVBench, a comprehensive benchmark of 100 real-world video post-production tasks, and finds that even the best AI agents perform significantly worse than human experts on these complex, multi-modal tasks.

HLL: Can Agents Cross Humanity's Last Line of Verification?

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate that physically grounded causal reasoning is necessary for reliable autonomous agents.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.CLcs.CVRecentJun 1, 2026

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more

View →

cs.AIRecentJun 1, 2026