Update sd-v2-infinite-zoom to genai usage (#2731)

aleksandr-mokrov · web-flow · commit 422fa4a01f7e · 2025-02-07T20:40:18.000+04:00
CVS-161647
diff --git a/notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb b/notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb
@@ -26,7 +26,7 @@
     "* The model comes with a new refined depth architecture capable of preserving context from prior generation layers in an image-to-image setting. This structure preservation helps generate images that preserving forms and shadow of objects, but with different content.\n",
     "* The model comes with an updated inpainting module built upon the previous model. This text-guided inpainting makes switching out parts in the image easier than before.\n",
     "\n",
-    "This notebook demonstrates how to download the model from the Hugging Face Hub and converted to OpenVINO IR format with [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion). And how to use the model to generate sequence of images for infinite zoom video effect.\n",
+    "This notebook demonstrates how to download the model from the Hugging Face Hub and convert to OpenVINO IR format with the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library. And how to use the model to generate sequence of images for infinite zoom video effect using [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API.\n",
     "\n",
     "\n",
     "<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb\" />\n"
@@ -103,7 +103,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"openvino>=2024.2.0\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
+    "%pip install -q -U \"openvino>=2025.0\" \"openvino-genai>=2025.0\"\n",
+    "%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
    ]
   },
   {
@@ -115,9 +116,27 @@
     "## Load Stable Diffusion Inpaint pipeline using Optimum Intel\n",
     "[back to top ⬆️](#Table-of-contents:)\n",
     "\n",
-    "We will load optimized Stable Diffusion model from the Hugging Face Hub and create pipeline to run an inference with OpenVINO Runtime by [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion).  \n",
+    "[stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) is available for downloading via the [HuggingFace hub](https://huggingface.co/models). We will use optimum-cli interface for exporting it into OpenVINO Intermediate Representation (IR) format.\n",
     "\n",
-    "For running the Stable Diffusion model with Optimum Intel, we will use the optimum.intel.OVStableDiffusionInpaintPipeline class, which represents the inference pipeline. OVStableDiffusionInpaintPipeline initialized by the from_pretrained method. It supports on-the-fly conversion models from PyTorch using the export=True parameter. A converted model can be saved on disk using the save_pretrained method for the next running. \n",
+    " Optimum CLI interface for converting models supports export to OpenVINO (supported starting optimum-intel 1.12 version).\n",
+    "General command format:\n",
+    "\n",
+    "```bash\n",
+    "optimum-cli export openvino --model <model_id_or_path> --task <task> <output_dir>\n",
+    "```\n",
+    "\n",
+    "where `task` is the task to export the model for, if not specified, the task will be auto-inferred based on the model.\n",
+    "\n",
+    "You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n",
+    "\n",
+    "Additionally, you can specify weights compression `--weight-format` for the model compression. Please note, that for INT8/INT4, it is necessary to install nncf.\n",
+    "\n",
+    "Full list of supported arguments available via `--help`\n",
+    "For more details and examples of usage, please check [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export).\n",
+    "\n",
+    "\n",
+    "For running the Stable Diffusion model, we will use [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API for running text generation. Firstly we will create pipeline with `InpaintingPipeline`. You can see more details in [Image Python Generation Pipeline Example](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2025/0/samples/python/image_generation#run-inpainting-pipeline).\n",
+    "Then we run the `generate` method and get the image tokens and then convert them into the image using `Image.fromarray` from PIL. Also we convert the input images to `ov.Tensor` using `image_to_tensor` function. \n",
     "\n",
     "Select device from dropdown list for running inference using OpenVINO."
    ]
@@ -138,6 +157,10 @@
     "    )\n",
     "    open(\"notebook_utils.py\", \"w\").write(r.text)\n",
     "\n",
+    "if not Path(\"cmd_helper.py\").exists():\n",
+    "    r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\")\n",
+    "    open(\"cmd_helper.py\", \"w\").write(r.text)\n",
+    "\n",
     "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
     "from notebook_utils import collect_telemetry\n",
     "\n",
@@ -157,21 +180,28 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from optimum.intel.openvino import OVStableDiffusionInpaintPipeline\n",
-    "from pathlib import Path\n",
+    "import openvino as ov\n",
+    "\n",
+    "from cmd_helper import optimum_cli\n",
     "\n",
-    "DEVICE = device.value\n",
     "\n",
     "MODEL_ID = \"stabilityai/stable-diffusion-2-inpainting\"\n",
     "MODEL_DIR = Path(\"sd2_inpainting\")\n",
     "\n",
-    "if not MODEL_DIR.exists():\n",
-    "    ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_ID, export=True, device=DEVICE, compile=False)\n",
-    "    ov_pipe.save_pretrained(MODEL_DIR)\n",
-    "else:\n",
-    "    ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_DIR, device=DEVICE, compile=False)\n",
+    "optimum_cli(MODEL_ID, MODEL_DIR, additional_args={\"weight-format\": \"fp16\"})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a424af25",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openvino_genai as ov_genai\n",
     "\n",
-    "ov_pipe.compile()"
+    "\n",
+    "pipe = ov_genai.InpaintingPipeline(MODEL_DIR, device.value)"
    ]
   },
   {
@@ -184,7 +214,7 @@
     "[back to top ⬆️](#Table-of-contents:)\n",
     "\n",
     "For achieving zoom effect, we will use inpainting to expand images beyond their original borders.\n",
-    "We run our `OVStableDiffusionInpaintPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
+    "We run our `InpaintingPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
     "\n",
     "![frame generation)](https://user-images.githubusercontent.com/29454499/228739686-436f2759-4c79-42a2-a70f-959fb226834c.png)\n",
     "\n",
@@ -208,11 +238,18 @@
     "from typing import List, Union\n",
     "\n",
     "import PIL\n",
+    "from PIL import Image\n",
     "import cv2\n",
     "from tqdm import trange\n",
     "import numpy as np\n",
     "\n",
     "\n",
+    "def image_to_tensor(image: Image) -> ov.Tensor:\n",
+    "    pic = image.convert(\"RGB\")\n",
+    "    image_data = np.array(pic.getdata()).reshape(1, pic.size[1], pic.size[0], 3).astype(np.uint8)\n",
+    "    return ov.Tensor(image_data)\n",
+    "\n",
+    "\n",
     "def generate_video(\n",
     "    pipe,\n",
     "    prompt: Union[str, List[str]],\n",
@@ -251,14 +288,19 @@
     "    mask_image = np.array(current_image)[:, :, 3]\n",
     "    mask_image = PIL.Image.fromarray(255 - mask_image).convert(\"RGB\")\n",
     "    current_image = current_image.convert(\"RGB\")\n",
-    "    init_images = pipe(\n",
+    "    current_image = image_to_tensor(current_image)\n",
+    "    mask_image = image_to_tensor(mask_image)\n",
+    "    image_tensors = pipe.generate(\n",
     "        prompt=prompt,\n",
     "        negative_prompt=negative_prompt,\n",
     "        image=current_image,\n",
     "        guidance_scale=guidance_scale,\n",
     "        mask_image=mask_image,\n",
     "        num_inference_steps=num_inference_steps,\n",
-    "    ).images\n",
+    "    )\n",
+    "    init_images = []\n",
+    "    for image_tensor in image_tensors.data:\n",
+    "        init_images.append(PIL.Image.fromarray(image_tensor))\n",
     "\n",
     "    image_grid(init_images, rows=1, cols=1)\n",
     "\n",
@@ -284,15 +326,17 @@
     "\n",
     "        # inpainting step\n",
     "        current_image = current_image.convert(\"RGB\")\n",
-    "        images = pipe(\n",
+    "        current_image = image_to_tensor(current_image)\n",
+    "        mask_image = image_to_tensor(mask_image)\n",
+    "        image_tensor = pipe.generate(\n",
     "            prompt=prompt,\n",
     "            negative_prompt=negative_prompt,\n",
     "            image=current_image,\n",
     "            guidance_scale=guidance_scale,\n",
     "            mask_image=mask_image,\n",
     "            num_inference_steps=num_inference_steps,\n",
-    "        ).images\n",
-    "        current_image = images[0]\n",
+    "        )\n",
+    "        current_image = PIL.Image.fromarray(image_tensor.data[0])\n",
     "        current_image.paste(prev_image, mask=prev_image)\n",
     "\n",
     "        # interpolation steps bewteen 2 inpainted images (=sequential zoom and crop)\n",
@@ -321,6 +365,7 @@
     "    fps = 30\n",
     "    save_path = video_file_name + \".mp4\"\n",
     "    write_video(save_path, all_frames, fps, reversed_order=zoom_in)\n",
+    "\n",
     "    return save_path"
    ]
   },
@@ -453,7 +498,7 @@
     "\n",
     "from gradio_helper import make_demo_zoom_video\n",
     "\n",
-    "demo = make_demo_zoom_video(ov_pipe, generate_video)\n",
+    "demo = make_demo_zoom_video(pipe, generate_video)\n",
     "\n",
     "try:\n",
     "    demo.queue().launch()\n",