Sherzod Hakimov
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper evaluates the performance of Vision-Language Models (VLMs) in a collaborative dialogue task requiring spatial reconstruction, finding that while detailed text representations improve results, the models still struggle with complex visual spatial reasoning.
The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role is secondary.
Papers
The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue
The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…