You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.
4
+
5
+
## Download and convert the model and tokenizers
6
+
7
+
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
8
+
9
+
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported.
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.
4
+
5
+
## Download and convert the model and tokenizers
6
+
7
+
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
8
+
9
+
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported.
Copy file name to clipboardexpand all lines: src/README.md
+91
Original file line number
Diff line number
Diff line change
@@ -196,6 +196,97 @@ int main(int argc, char* argv[]) {
196
196
}
197
197
```
198
198
199
+
### Performance Metrics
200
+
201
+
`openvino_genai.PerfMetrics` (referred as `PerfMetrics` for simplicity) is a structure that holds performance metrics for each generate call. `PerfMetrics` holds fields with mean and standard deviations for the following metrics:
202
+
- Time To the First Token (TTFT), ms
203
+
- Time per Output Token (TPOT), ms/token
204
+
- Generate total duration, ms
205
+
- Tokenization duration, ms
206
+
- Detokenization duration, ms
207
+
- Throughput, tokens/s
208
+
209
+
and:
210
+
- Load time, ms
211
+
- Number of generated tokens
212
+
- Number of tokens in the input prompt
213
+
214
+
Performance metrics are stored either in the `DecodedResults` or `EncodedResults` `perf_metric` field. Additionally to the fields mentioned above, `PerfMetrics` has a member `raw_metrics` of type `openvino_genai.RawPerfMetrics` (referred to as `RawPerfMetrics` for simplicity) that contains raw values for the durations of each batch of new token generation, tokenization durations, detokenization durations, and more. These raw metrics are accessible if you wish to calculate your own statistical values such as median or percentiles. However, since mean and standard deviation values are usually sufficient, we will focus on `PerfMetrics`.
215
+
216
+
```python
217
+
import openvino_genai as ov_genai
218
+
pipe = ov_genai.LLMPipeline(model_path, "CPU")
219
+
result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
>**Note**: If the input prompt is just a string, the generate function returns only a string without perf_metrics. To obtain perf_metrics, provide the prompt as a list with at least one element or call generate with encoded inputs.
253
+
254
+
Several `perf_metrics` can be added to each other. In that case `raw_metrics` are concatenated and mean/std values are recalculated. This accumulates statistics from several `generate()` calls
255
+
256
+
```cpp
257
+
#include"openvino/genai/llm_pipeline.hpp"
258
+
#include<iostream>
259
+
260
+
intmain(int argc, char* argv[]) {
261
+
std::string model_path = argv[1];
262
+
ov::genai::LLMPipeline pipe(model_path, "CPU");
263
+
auto result_1 = pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(20));
264
+
auto result_2 = pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(20));
265
+
auto perf_metrics = result_1.perf_metrics + result_2.perf_metrics
For more examples of how metrics are used, please refer to the Python [benchmark_genai.py](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/samples/python/benchmark_genai/README.md) and C++ [benchmark_genai](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/samples/cpp/benchmark_genai/README.md) samples.
289
+
199
290
## How It Works
200
291
201
292
For information on how OpenVINO™ GenAI works, refer to the [How It Works Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/HOW_IT_WORKS.md).
0 commit comments