Papers similar to 2606.01213

~ similar to 2606.01213· 19 results

cs.CVcs.AIRecentJun 3, 2026

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.

View →

cs.CVcs.AIRecentMay 30, 2026

CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences

Fangzhou Lin, Peiran Li, Lingyu Xu, Wenjing Chen +11 more

The paper introduces CV-Arena, a large-scale open benchmark for instructional computer vision, demonstrating that professional-grade image editing requires advanced capabilities in physical reasoning…

View →

cs.CRRecentMay 11, 2026

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more

This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…

View →

cs.AIRecentMay 27, 2026

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

Fei Deng, Yanwu Xu, Zhipeng Bao, Zhixing Zhang +3 more

BlazeEdit is a highly efficient, generalist image-to-image diffusion model designed for on-device deployment, consolidating multiple editing tasks into a compact 195M parameter model that runs quickly…

View →

cs.CLcs.CVRecentMay 30, 2026

Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs

Xin Gao, Cheng Yang, Chufan Shi, Taylor Berg-Kirkpatrick

The paper introduces UniKE, a benchmark showing that successful knowledge edits in text-only multimodal models do not reliably transfer to image generation, revealing a significant modality gap.

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →

cs.CRRecentApr 12, 2026

SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits

Mengieong Hoi, Zhedong Zheng, Ping Liu, Wei Liu

The paper introduces SEED, a large-scale benchmark dataset for tracing sequential deepfake facial edits, and proposes FAITH, a frequency-aware Transformer model that effectively detects and orders the…

View →

cs.CVcs.AIcs.CRRecentMar 30, 2026

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Hannes Mareen, Dimitrios Karageorgiou, Paschalis Giakoumoglou, Peter Lambert +2 more

The paper introduces TGIF2, an extended dataset and benchmark that evaluates the forensic robustness of image forgery detection methods against modern, advanced text-guided inpainting techniques.

View →

cs.CVcs.AIRecentMay 28, 2026

GPIC: A Giant Permissive Image Corpus for Visual Generation

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang +5 more

The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…

View →

cs.CVcs.AIRecentMay 31, 2026

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

Zhihong Liu, Siqi Kou, Zheng Li, Ye Ma +4 more

The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate edi…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…

View →

cs.CVcs.AIcs.CRRecentMar 18, 2026

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen +2 more

Rel-Zero proposes a novel zero-watermarking technique that embeds invisible watermarks by exploiting the invariance of relational distances between image patches during AI editing, achieving superior…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVcs.CRRecentApr 17, 2026

Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration

Jun Li, Lizhi Xiong, Ziqiang Li, Weiwei Jiang +3 more

The paper introduces TICoE, a text-image collaborative framework that achieves precise and faithful concept removal from text-to-image generative models, surpassing existing methods in both precision…

View →

cs.GRcs.AIcs.CVRecentMay 28, 2026

Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes

Ruixiang Jiang, Chang Wen Chen

The paper introduces 3D aesthetic portrait planning, a method that pre-calculates optimal human pose, camera, and lighting configurations within a 3D scene to generate visually compelling and physical…

View →

cs.CVRecentJun 1, 2026

Chroma Clues: Leveraging Color Statistics to Detect Synthetic Images

Lea Uhlenbrock, Davide Cozzolino, Christian Riess

This paper proposes using color statistics, specifically through novel color transformations, to detect AI-generated synthetic images by exploiting the color-imitation weaknesses of current generative…

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.CRcs.AIRecentMay 2, 2026

VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

Pang Liu, Yingjie Lao

The paper introduces a dual-dimension evaluation for universal adversarial attacks on Vision-Language Models (VLMs), demonstrating that high reported attack success rates significantly overestimate th…

View →

cs.CVcs.AIRecentMay 31, 2026

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

Gyojin Han, Junmo Kim

The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…

View →