Skip to content

Commit 422fa4a

Browse files
Update sd-v2-infinite-zoom to genai usage (#2731)
CVS-161647
1 parent e3eef01 commit 422fa4a

File tree

1 file changed

+65
-20
lines changed

1 file changed

+65
-20
lines changed

notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb

+65-20
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
"* The model comes with a new refined depth architecture capable of preserving context from prior generation layers in an image-to-image setting. This structure preservation helps generate images that preserving forms and shadow of objects, but with different content.\n",
2727
"* The model comes with an updated inpainting module built upon the previous model. This text-guided inpainting makes switching out parts in the image easier than before.\n",
2828
"\n",
29-
"This notebook demonstrates how to download the model from the Hugging Face Hub and converted to OpenVINO IR format with [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion). And how to use the model to generate sequence of images for infinite zoom video effect.\n",
29+
"This notebook demonstrates how to download the model from the Hugging Face Hub and convert to OpenVINO IR format with the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library. And how to use the model to generate sequence of images for infinite zoom video effect using [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API.\n",
3030
"\n",
3131
"\n",
3232
"<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb\" />\n"
@@ -103,7 +103,8 @@
103103
"metadata": {},
104104
"outputs": [],
105105
"source": [
106-
"%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"openvino>=2024.2.0\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
106+
"%pip install -q -U \"openvino>=2025.0\" \"openvino-genai>=2025.0\"\n",
107+
"%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
107108
]
108109
},
109110
{
@@ -115,9 +116,27 @@
115116
"## Load Stable Diffusion Inpaint pipeline using Optimum Intel\n",
116117
"[back to top ⬆️](#Table-of-contents:)\n",
117118
"\n",
118-
"We will load optimized Stable Diffusion model from the Hugging Face Hub and create pipeline to run an inference with OpenVINO Runtime by [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion). \n",
119+
"[stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) is available for downloading via the [HuggingFace hub](https://huggingface.co/models). We will use optimum-cli interface for exporting it into OpenVINO Intermediate Representation (IR) format.\n",
119120
"\n",
120-
"For running the Stable Diffusion model with Optimum Intel, we will use the optimum.intel.OVStableDiffusionInpaintPipeline class, which represents the inference pipeline. OVStableDiffusionInpaintPipeline initialized by the from_pretrained method. It supports on-the-fly conversion models from PyTorch using the export=True parameter. A converted model can be saved on disk using the save_pretrained method for the next running. \n",
121+
" Optimum CLI interface for converting models supports export to OpenVINO (supported starting optimum-intel 1.12 version).\n",
122+
"General command format:\n",
123+
"\n",
124+
"```bash\n",
125+
"optimum-cli export openvino --model <model_id_or_path> --task <task> <output_dir>\n",
126+
"```\n",
127+
"\n",
128+
"where `task` is the task to export the model for, if not specified, the task will be auto-inferred based on the model.\n",
129+
"\n",
130+
"You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n",
131+
"\n",
132+
"Additionally, you can specify weights compression `--weight-format` for the model compression. Please note, that for INT8/INT4, it is necessary to install nncf.\n",
133+
"\n",
134+
"Full list of supported arguments available via `--help`\n",
135+
"For more details and examples of usage, please check [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export).\n",
136+
"\n",
137+
"\n",
138+
"For running the Stable Diffusion model, we will use [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API for running text generation. Firstly we will create pipeline with `InpaintingPipeline`. You can see more details in [Image Python Generation Pipeline Example](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2025/0/samples/python/image_generation#run-inpainting-pipeline).\n",
139+
"Then we run the `generate` method and get the image tokens and then convert them into the image using `Image.fromarray` from PIL. Also we convert the input images to `ov.Tensor` using `image_to_tensor` function. \n",
121140
"\n",
122141
"Select device from dropdown list for running inference using OpenVINO."
123142
]
@@ -138,6 +157,10 @@
138157
" )\n",
139158
" open(\"notebook_utils.py\", \"w\").write(r.text)\n",
140159
"\n",
160+
"if not Path(\"cmd_helper.py\").exists():\n",
161+
" r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\")\n",
162+
" open(\"cmd_helper.py\", \"w\").write(r.text)\n",
163+
"\n",
141164
"# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
142165
"from notebook_utils import collect_telemetry\n",
143166
"\n",
@@ -157,21 +180,28 @@
157180
"metadata": {},
158181
"outputs": [],
159182
"source": [
160-
"from optimum.intel.openvino import OVStableDiffusionInpaintPipeline\n",
161-
"from pathlib import Path\n",
183+
"import openvino as ov\n",
184+
"\n",
185+
"from cmd_helper import optimum_cli\n",
162186
"\n",
163-
"DEVICE = device.value\n",
164187
"\n",
165188
"MODEL_ID = \"stabilityai/stable-diffusion-2-inpainting\"\n",
166189
"MODEL_DIR = Path(\"sd2_inpainting\")\n",
167190
"\n",
168-
"if not MODEL_DIR.exists():\n",
169-
" ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_ID, export=True, device=DEVICE, compile=False)\n",
170-
" ov_pipe.save_pretrained(MODEL_DIR)\n",
171-
"else:\n",
172-
" ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_DIR, device=DEVICE, compile=False)\n",
191+
"optimum_cli(MODEL_ID, MODEL_DIR, additional_args={\"weight-format\": \"fp16\"})"
192+
]
193+
},
194+
{
195+
"cell_type": "code",
196+
"execution_count": null,
197+
"id": "a424af25",
198+
"metadata": {},
199+
"outputs": [],
200+
"source": [
201+
"import openvino_genai as ov_genai\n",
173202
"\n",
174-
"ov_pipe.compile()"
203+
"\n",
204+
"pipe = ov_genai.InpaintingPipeline(MODEL_DIR, device.value)"
175205
]
176206
},
177207
{
@@ -184,7 +214,7 @@
184214
"[back to top ⬆️](#Table-of-contents:)\n",
185215
"\n",
186216
"For achieving zoom effect, we will use inpainting to expand images beyond their original borders.\n",
187-
"We run our `OVStableDiffusionInpaintPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
217+
"We run our `InpaintingPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
188218
"\n",
189219
"![frame generation)](https://user-images.githubusercontent.com/29454499/228739686-436f2759-4c79-42a2-a70f-959fb226834c.png)\n",
190220
"\n",
@@ -208,11 +238,18 @@
208238
"from typing import List, Union\n",
209239
"\n",
210240
"import PIL\n",
241+
"from PIL import Image\n",
211242
"import cv2\n",
212243
"from tqdm import trange\n",
213244
"import numpy as np\n",
214245
"\n",
215246
"\n",
247+
"def image_to_tensor(image: Image) -> ov.Tensor:\n",
248+
" pic = image.convert(\"RGB\")\n",
249+
" image_data = np.array(pic.getdata()).reshape(1, pic.size[1], pic.size[0], 3).astype(np.uint8)\n",
250+
" return ov.Tensor(image_data)\n",
251+
"\n",
252+
"\n",
216253
"def generate_video(\n",
217254
" pipe,\n",
218255
" prompt: Union[str, List[str]],\n",
@@ -251,14 +288,19 @@
251288
" mask_image = np.array(current_image)[:, :, 3]\n",
252289
" mask_image = PIL.Image.fromarray(255 - mask_image).convert(\"RGB\")\n",
253290
" current_image = current_image.convert(\"RGB\")\n",
254-
" init_images = pipe(\n",
291+
" current_image = image_to_tensor(current_image)\n",
292+
" mask_image = image_to_tensor(mask_image)\n",
293+
" image_tensors = pipe.generate(\n",
255294
" prompt=prompt,\n",
256295
" negative_prompt=negative_prompt,\n",
257296
" image=current_image,\n",
258297
" guidance_scale=guidance_scale,\n",
259298
" mask_image=mask_image,\n",
260299
" num_inference_steps=num_inference_steps,\n",
261-
" ).images\n",
300+
" )\n",
301+
" init_images = []\n",
302+
" for image_tensor in image_tensors.data:\n",
303+
" init_images.append(PIL.Image.fromarray(image_tensor))\n",
262304
"\n",
263305
" image_grid(init_images, rows=1, cols=1)\n",
264306
"\n",
@@ -284,15 +326,17 @@
284326
"\n",
285327
" # inpainting step\n",
286328
" current_image = current_image.convert(\"RGB\")\n",
287-
" images = pipe(\n",
329+
" current_image = image_to_tensor(current_image)\n",
330+
" mask_image = image_to_tensor(mask_image)\n",
331+
" image_tensor = pipe.generate(\n",
288332
" prompt=prompt,\n",
289333
" negative_prompt=negative_prompt,\n",
290334
" image=current_image,\n",
291335
" guidance_scale=guidance_scale,\n",
292336
" mask_image=mask_image,\n",
293337
" num_inference_steps=num_inference_steps,\n",
294-
" ).images\n",
295-
" current_image = images[0]\n",
338+
" )\n",
339+
" current_image = PIL.Image.fromarray(image_tensor.data[0])\n",
296340
" current_image.paste(prev_image, mask=prev_image)\n",
297341
"\n",
298342
" # interpolation steps bewteen 2 inpainted images (=sequential zoom and crop)\n",
@@ -321,6 +365,7 @@
321365
" fps = 30\n",
322366
" save_path = video_file_name + \".mp4\"\n",
323367
" write_video(save_path, all_frames, fps, reversed_order=zoom_in)\n",
368+
"\n",
324369
" return save_path"
325370
]
326371
},
@@ -453,7 +498,7 @@
453498
"\n",
454499
"from gradio_helper import make_demo_zoom_video\n",
455500
"\n",
456-
"demo = make_demo_zoom_video(ov_pipe, generate_video)\n",
501+
"demo = make_demo_zoom_video(pipe, generate_video)\n",
457502
"\n",
458503
"try:\n",
459504
" demo.queue().launch()\n",

0 commit comments

Comments
 (0)