Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Rui Song

Rui Song

3 indexed papers

Recent (6 mo)
3
With code
0
Influential cites
0
Benchmarked
0

Publications per year

3
26

Top categories

AI×2Multimedia×2NLP×1Vision×1ML×1Crypto×1

Frequent co-authors

Xinhao Song1×
Su Su1×
Sirui Song1×
Hongliang Wu1×
Wen Shen1×
Zhihua Wei1×

Research Timeline

2026
AgenticVBench: Can AI Agents Complete Real-World Post-Production Tasks?

The paper introduces AgenticVBench, a comprehensive benchmark of 100 real-world video post-production tasks, and finds that even the best AI agents perform significantly worse than human experts on these complex, multi-modal tasks.

HLL: Can Agents Cross Humanity's Last Line of Verification?

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate that physically grounded causal reasoning is necessary for reliable autonomous agents.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.CLcs.CVRecentJun 1, 2026

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more

The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…

View →
cs.AIRecentJun 1, 2026

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…

View →
cs.CRcs.MMRecentMay 26, 2026

AgenticVBench: Can AI Agents Complete Real-World Post-Production Tasks?

Zongheng Cao, Yi Zheng, Rui Song, Xinyu Hu

The paper introduces AgenticVBench, a comprehensive benchmark of 100 real-world video post-production tasks, and finds that even the best AI agents perform significantly worse than human experts on th…

View →