OmniGen2: Exploration to Advanced Multimodal Generation papercode
💡 Quick Tips for Best Results (see our github for more details)
Image Quality: Use high-resolution images (at least 512x512 recommended).
Be Specific: Instead of "Add bird to desk", try "Add the bird from image 1 to the desk in image 2".
Use English: English prompts currently yield better results.
Increase image_guidance_scale for better consistency with the reference image:
Image Editing: 1.3 - 2.0
In-context Generation: 2.0 - 3.0
For in-context edit (edit based multiple images), we recommend using the following prompt format: "Edit the first image: add/replace (the [object] with) the [object] from the second image. [descripton for your target image]."
For example: "Edit the first image: add the man from the second image. The man is talking with a woman in the kitchen"
Compared to OmniGen 1.0, although OmniGen2 has made some improvements, some issues still remain. It may take multiple attempts to achieve a satisfactory result.
2561024
2561024
18
13
01
01
Scheduler
The scheduler to use for the model.
20100
14
-12147483647
2562048
655362359296
Examples
Enter your prompt. Use "first/second image" or “第一张图/第二张图” as reference.
Width
Height
Scheduler
Inference Steps
First Image
Second Image
Third Image
Enter your negative prompt
Text Guidance Scale
Image Guidance Scale
CFG Range Start
CFG Range End
Number of images per prompt
max_input_image_side_length
max_pixels
Seed
Pages:
@article{wu2025omnigen2,
title={OmniGen2: Exploration to Advanced Multimodal Generation},
author={Chenyuan Wu and Pengfei Zheng and Ruiran Yan and Shitao Xiao and Xin Luo and Yueze Wang and Wanli Li and Xiyan Jiang and Yexin Liu and Junjie Zhou and Ze Liu and Ziyi Xia and Chaofan Li and Haoge Deng and Jiahao Wang and Kun Luo and Bo Zhang and Defu Lian and Xinlong Wang and Zhongyuan Wang and Tiejun Huang and Zheng Liu},
journal={arXiv preprint arXiv:2506.18871},
year={2025}
}