Ming-Hsuan Yang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces CV-Arena, a large-scale open benchmark for instructional computer vision, demonstrating that professional-grade image editing requires advanced capabilities in physical reasoning and structural control.
Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, achieving significant gains over unconstrained tool use.
Papers
Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning
Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…