Miao Pan
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces DiffErase, a black-box attack that effectively removes inaudible audio watermarks while preserving perceptual quality by utilizing diffusion models.
The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence from a world simulator.
Papers
Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators
Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao +3 more
The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence fro…