Papers similar to 2605.30570

~ similar to 2605.30570· 19 results

cs.CVRecentJun 1, 2026

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…

View →

cs.AIcs.MARecentMay 28, 2026

On the Geometry of Games and their Solvers

Yaqi Sun, Julian Ma, David Mguni

The paper proposes a unified framework that maps the geometry of games to effective solver dynamics, suggesting that solvability is governed by continuous structural properties rather than discrete cl…

View →

cs.CVcs.AIcs.GRRecentMay 31, 2026

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more

The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.

View →

cs.AIRecentMay 27, 2026

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Yiheng Zhu, Kangle Deng, Jean-Philippe Fauconnier, Inaki Navarro +8 more

CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more

The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.

View →

cs.CLRecentMay 29, 2026

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Tianjie Ju, Yueqing Sun, Zheng Wu, Wei Zhang +6 more

The paper introduces MineExplorer, a new benchmark in Minecraft, to evaluate the sustained open-world exploration capabilities of MLLM agents, finding that long-horizon coordination remains a signific…

View →

cs.CVcs.AIRecentJun 1, 2026

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

Kaidi Zhang, Guanxu Zhu

The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.AIRecentJun 1, 2026

WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis

Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more

The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.

View →

cs.SEcs.AIcs.CVRecentMay 27, 2026

GUI Agents for Continual Game Generation

Yixu Huang, Bo Li, Na Li, Zhe Wang +7 more

The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-…

View →

cs.CVcs.AIRecentMay 28, 2026

CityGen: Structure-Guided City-Style Synthesis for Cross-City Autonomous Driving

Zezhong Qian, Zhao Yang, Lu Tan, Zhihao Yan +3 more

The paper introduces CityGen, a diffusion-based framework that enables zero-label city adaptation for autonomous driving by synthesizing city-style data conditioned on HD maps and visual prompts, sign…

View →

cs.CVcs.AIRecentJun 1, 2026

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li, Dianbo Liu, Kenji Kawaguchi

The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…

View →

cs.CVcs.CRcs.LGRecentMay 14, 2026

Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion

Chenyi Wang, Ruoyu Song, Raymond Muller, Jean-Philippe Monteuuis +4 more

The paper introduces MIRAGE, a framework that systematically discovers semantic attacks on online HD map construction by finding plausible environmental variations that bypass standard adversarial def…

View →

cs.AIRecentMay 28, 2026

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

Yanchao Li, Wanhao Liu, Ben Gao, Jiaqing Xie +4 more

SkillsInjector proposes a two-stage adaptive method to dynamically optimize skill selection, quantity, and presentation for LLM agents, significantly improving task performance over static injection m…

View →

cs.CRcs.AIcs.CVRecentMay 11, 2026

BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data

Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Guramrit Singh +2 more

The paper introduces BEACON, a large-scale, multimodal dataset capturing diverse behavioral signals from competitive Valorant gameplay, designed for rigorous testing of continuous authentication and b…

View →

cs.SEcs.AIcs.LGRecentMay 29, 2026

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

Nazmus Ashrafi

The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…

View →

cs.AIRecentMay 28, 2026

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more

The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…

View →

cs.CRcs.CVRecentMay 7, 2026

Secure Seed-Based Multi-bit Watermarking for Diffusion Models from First Principles

Enoal Gesny, Eva Giboulot

The paper introduces a theoretically grounded evaluation framework for watermarking generative models, proposing a novel method (SSB) that allows for systematic design across all security-robustness-f…

View →

cs.GRcs.CVcs.LGRecentJun 3, 2026

Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting

Hongyu Zhou, Zorah Lähner

The paper proposes a novel method to improve the simultaneous representation of appearance and geometry in 3D Gaussian Splatting by introducing an additional geometry opacity parameter.

View →