Skip to content

Commit 467f2d5

Browse files
authored
Add SigLIP model (#1649)
* Add siglip notebook * Update notebook number * Add notebook to readme * Update notebook number * Clean cells ouput & add image url * Fix review comments * Fix review comments * Fix ci * Move gradio to the end
1 parent 8063ecb commit 467f2d5

File tree

4 files changed

+782
-1
lines changed

4 files changed

+782
-1
lines changed

.ci/spellcheck/.pyspelling.wordlist.txt

+3
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ configs
8888
Connectionist
8989
ContentVec
9090
Contrastive
91+
contrastive
9192
ControlNet
9293
controlnet
9394
ConvE
@@ -608,6 +609,8 @@ Shazeer
608609
Shutterstock
609610
siggraph
610611
sigmoid
612+
SigLIP
613+
siglip
611614
SISR
612615
SlimOrca
613616
SlowFast

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ Demos that demonstrate inference on a particular model.
234234
| [279-mobilevlm-language-assistant](notebooks/279-mobilevlm-language-assistant)<br> | Mobile language assistant with MobileVLM and OpenVINO | |
235235
| [280-depth-anything](notebooks/280-depth-anything)<br>[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F280-depth-anythingh%2F280-depth-anything.ipynb)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/280-depth-anything/280-depth-anything.ipynb) | Monocular Depth Estimation with DepthAnything and OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/a9a16658-512f-470c-a33c-0e1f9d0ae72c width=225> |
236236
| [281-kosmos2-multimodal-large-language-model](notebooks/281-kosmos2-multimodal-large-language-model)<br> | Kosmos-2: Multimodal Large Language Model and OpenVINO™ | <img src=https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/annotated_snowman.jpg width=225> |
237-
237+
| [282-siglip-zero-shot-image-classification](notebooks/282-siglip-zero-shot-image-classification)<br>[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/282-siglip-zero-shot-image-classification/282-siglip-zero-shot-image-classification.ipynb) | Zero-shot Image Classification with SigLIP | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/c4eb782c-0fef-4a89-a5c6-5cc43518490b width=500> |
238238

239239
<div id='-model-training'></div>
240240

notebooks/282-siglip-zero-shot-image-classification/282-siglip-zero-shot-image-classification.ipynb

+739
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Zero-shot Image Classification with SigLIP
2+
3+
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/282-siglip-zero-shot-image-classification/282-siglip-zero-shot-image-classification.ipynb)
4+
5+
Zero-shot image classification is a computer vision task with the goal to classify images into one of several classes without any prior training or knowledge of these classes.
6+
7+
![zero-shot-pipeline](https://user-images.githubusercontent.com/29454499/207773481-d77cacf8-6cdc-4765-a31b-a1669476d620.png)
8+
9+
In this tutorial, you will use the [SigLIP](https://huggingface.co/docs/transformers/main/en/model_doc/siglip) model to perform zero-shot image classification.
10+
11+
## Notebook Contents
12+
13+
This tutorial demonstrates how to perform zero-shot image classification using the open-source SigLIP model. The SigLIP model was proposed in the [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) paper. SigLIP suggests replacing the loss function used in [CLIP](https://github.com/openai/CLIP) (Contrastive Language–Image Pre-training) with a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.
14+
15+
![siglip-performance-comparison](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/siglip_table.jpeg)
16+
17+
[\*_image source_](https://arxiv.org/abs/2303.15343)
18+
19+
You can find more information about this model in the [research paper](https://arxiv.org/abs/2303.15343), [GitHub repository](https://github.com/google-research/big_vision), [Hugging Face model page](https://huggingface.co/docs/transformers/main/en/model_doc/siglip).
20+
21+
The notebook contains the following steps:
22+
23+
1. Instantiate model.
24+
1. Run PyTorch model inference.
25+
1. Convert the model to OpenVINO Intermediate Representation (IR) format.
26+
1. Run OpenVINO model.
27+
1. Apply post-training quantization using [NNCF](https://github.com/openvinotoolkit/nncf):
28+
1. Prepare dataset.
29+
1. Quantize model.
30+
1. Run quantized OpenVINO model.
31+
1. Compare File Size.
32+
1. Compare inference time of the FP16 IR and quantized models.
33+
34+
The results of the SigLIP model's performance in zero-shot image classification from this notebook are demonstrated in the image below.
35+
![image](https://github.com/openvinotoolkit/openvino_notebooks/assets/67365453/c4eb782c-0fef-4a89-a5c6-5cc43518490b)
36+
37+
## Installation Instructions
38+
39+
If you have not installed all required dependencies, follow the [Installation Guide](../../README.md).

0 commit comments

Comments
 (0)