~ similar to 2606.01213· 19 results
GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.
Fangzhou Lin, Peiran Li, Lingyu Xu, Wenjing Chen +11 more
The paper introduces CV-Arena, a large-scale open benchmark for instructional computer vision, demonstrating that professional-grade image editing requires advanced capabilities in physical reasoning…
Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more
This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…
Fei Deng, Yanwu Xu, Zhipeng Bao, Zhixing Zhang +3 more
BlazeEdit is a highly efficient, generalist image-to-image diffusion model designed for on-device deployment, consolidating multiple editing tasks into a compact 195M parameter model that runs quickly…
The paper introduces UniKE, a benchmark showing that successful knowledge edits in text-only multimodal models do not reliably transfer to image generation, revealing a significant modality gap.
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
The paper introduces SEED, a large-scale benchmark dataset for tracing sequential deepfake facial edits, and proposes FAITH, a frequency-aware Transformer model that effectively detects and orders the…
The paper introduces TGIF2, an extended dataset and benchmark that evaluates the forensic robustness of image forgery detection methods against modern, advanced text-guided inpainting techniques.
The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…
Zhihong Liu, Siqi Kou, Zheng Li, Ye Ma +4 more
The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate edi…
The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen +2 more
Rel-Zero proposes a novel zero-watermarking technique that embeds invisible watermarks by exploiting the invariance of relational distances between image patches during AI editing, achieving superior…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Jun Li, Lizhi Xiong, Ziqiang Li, Weiwei Jiang +3 more
The paper introduces TICoE, a text-image collaborative framework that achieves precise and faithful concept removal from text-to-image generative models, surpassing existing methods in both precision…
The paper introduces 3D aesthetic portrait planning, a method that pre-calculates optimal human pose, camera, and lighting configurations within a 3D scene to generate visually compelling and physical…
This paper proposes using color statistics, specifically through novel color transformations, to detect AI-generated synthetic images by exploiting the color-imitation weaknesses of current generative…
Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more
The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…
The paper introduces a dual-dimension evaluation for universal adversarial attacks on Vision-Language Models (VLMs), demonstrating that high reported attack success rates significantly overestimate th…
The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…