Added quantization for OUTETTS (#2662)

nikita-malininn · web-flow · commit 0b4b204269cd · 2025-02-18T23:20:05.000+04:00
- Added quantization via `nncf.quantize` to the notebook
- Quantization with the `mixed` preset and `transformer` model type due
to model architecture (LLM)
- Quantization with the `ignored scope` due to OpenVINO issues with the
optimized SDPA inference
- Quality validation of the quantized model only with expert listening
is possible because of the nature of the task
- Performance validation only with the `generate` pipeline is possible
because of the model architecture

Ticket: 157133
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -457,6 +457,7 @@ learnable
 LeViT
 LibriSpeech
 librispeech
+LibriTTS
 Lim
 LinearCameraEmbedder
 Lippipeline
diff --git a/notebooks/outetts-text-to-speech/README.md b/notebooks/outetts-text-to-speech/README.md
@@ -2,17 +2,19 @@
 
 [OuteTTS-0.1-350M](https://huggingface.co/OuteAI/OuteTTS-0.1-350M) is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures, built upon the LLaMa architecture. It demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens.
 
-More details about model can be found in [original repo](https://github.com/edwko/OuteTTS).
+More details about the model can be found in [original repo](https://github.com/edwko/OuteTTS).
 
-In this tutorial we consider how to run OuteTTS pipeline using OpenVINO.
+In this tutorial, we consider how to run the OuteTTS pipeline using OpenVINO.
 
 ## Notebook Contents
 
 The tutorial consists of the following steps:
 
-* Convert model to OpenVINO format using Optimum Intel
-* Run Text-to-Speech synthesis using OpenVINO model
-* Run Text-to-Speech synthesis with Voice Cloning using OpenVINO model
+* Convert the model to OpenVINO format using Optimum Intel
+* Run Text-to-Speech synthesis using the OpenVINO model
+* Run Text-to-Speech synthesis with Voice Cloning using the OpenVINO model
+* Optimize model using OpenVINO and NNCF
+* Compare original and quantized model Text-to-Speech synthesis and performance
 * Interactive demo
 
 ## Installation Instructions
diff --git a/notebooks/outetts-text-to-speech/outetts-text-to-speech.ipynb b/notebooks/outetts-text-to-speech/outetts-text-to-speech.ipynb