You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
deepseek-r1 Notebook: Add Option for DeepSeek-R1-Distill-Qwen-32B Model (#2732)
- #2718
- The 32B is the one comparable in capability as OpenAI-o1-mini 100B.
- Tested with OpenVINO 2024.6 on ARL-S 285K CPU with 32 GB RAM and 500
GB drive (200 GB for vRAM swap file and 200 GB to store the model).
Also, tested successfully on ARL-H 285H iGPU with 64 GB RAM.
- Tested with OpenVINO 2025.0 on ARL-S 285K CPU as well.
Copy file name to clipboardexpand all lines: notebooks/deepseek-r1/README.md
+1
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ The tutorial supports different models, you can select one from the provided opt
12
12
***DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.
13
13
***DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.
14
14
***DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.
15
+
***DeepSeek-R1-Distil-Qwen-32B** is a distilled model based on [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) that has comparable capability as OpenAI o1-mini. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) for more info. As the original model size is about 65GB, to quantize it to INT4 requires 32GB of RAM with 200GB for Swap File and another 200GB storage to save the models. The INT4 quantized model has about 16GB in size and requires 32GB of RAM when performing inference on CPU or 64GB of RAM on iGPU.
15
16
16
17
Learn how to accelerate **DeepSeek-R1-Distill-Llama-8B** with **FastDraft** and OpenVINO GenAI speculative decoding pipeline in this [notebook](../../supplementary_materials/notebooks/fastdraft-deepseek/fastdraft_deepseek.ipynb)
Copy file name to clipboardexpand all lines: notebooks/deepseek-r1/deepseek-r1.ipynb
+1
Original file line number
Diff line number
Diff line change
@@ -110,6 +110,7 @@
110
110
"* **DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.\n",
111
111
"* **DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.\n",
112
112
"* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.\n",
113
+
"* **DeepSeek-R1-Distil-Qwen-32B** is a distilled model based on [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) that has comparable capability as OpenAI o1-mini. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) for more info. As the original model size is about 65GB, to quantize it to INT4 requires 32GB of RAM with 200GB for Swap File and another 200GB storage to save the models. The INT4 quantized model has about 16GB in size and requires 32GB of RAM when performing inference on CPU or 64GB of RAM on iGPU.\n",
113
114
"\n",
114
115
"[Weight compression](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html) is a technique for enhancing the efficiency of models, especially those with large memory requirements. This method reduces the model’s memory footprint, a crucial factor for Large Language Models (LLMs). We provide several options for model weight compression:\n",
0 commit comments