Liang Wu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces PhoneWorld, a scalable pipeline that automatically converts real-world GUI trajectories and screenshots into controllable, reproducible phone-use environments, significantly improving agent performance across multiple mobile benchmarks.
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents are still brittle and fail under realistic conditions.
Papers
HLL: Can Agents Cross Humanity's Last Line of Verification?
Xinhao Song, Su Su, Sirui Song, Hongliang Wu +5 more
The paper introduces HLL, a benchmark that tests if multimodal agents can successfully substitute for human verification (like CAPTCHA) in complex, real-world workflows, finding that current agents ar…