~ similar to 2605.30570· 19 results
The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…
The paper proposes a unified framework that maps the geometry of games to effective solver dynamics, suggesting that solvability is governed by continuous structural properties rather than discrete cl…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.
Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more
The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.
Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more
The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
Yixu Huang, Bo Li, Na Li, Zhe Wang +7 more
The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-…
Zezhong Qian, Zhao Yang, Lu Tan, Zhihao Yan +3 more
The paper introduces CityGen, a diffusion-based framework that enables zero-label city adaptation for autonomous driving by synthesizing city-style data conditioned on HD maps and visual prompts, sign…
The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…
The paper introduces MIRAGE, a framework that systematically discovers semantic attacks on online HD map construction by finding plausible environmental variations that bypass standard adversarial def…
Yanchao Li, Wanhao Liu, Ben Gao, Jiaqing Xie +4 more
SkillsInjector proposes a two-stage adaptive method to dynamically optimize skill selection, quantity, and presentation for LLM agents, significantly improving task performance over static injection m…
The paper introduces BEACON, a large-scale, multimodal dataset capturing diverse behavioral signals from competitive Valorant gameplay, designed for rigorous testing of continuous authentication and b…
The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…
Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more
The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…
The paper introduces a theoretically grounded evaluation framework for watermarking generative models, proposing a novel method (SSB) that allows for systematic design across all security-robustness-f…
The paper proposes a novel method to improve the simultaneous representation of appearance and geometry in 3D Gaussian Splatting by introducing an additional geometry opacity parameter.