You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here we provide examples of Torch-TensorRT compilation of popular computer vision and language models.
5
4
6
-
Torch-TensorRT provides a backend for the new ``torch.compile`` API released in PyTorch 2.0. In the following examples we describe
7
-
a number of ways you can leverage this backend to accelerate inference.
5
+
Dependencies
6
+
------------------------------------
8
7
8
+
Please install the following external dependencies (assuming you already have correct `torch`, `torch_tensorrt` and `tensorrt` libraries installed (`dependencies <https://github.com/pytorch/TensorRT?tab=readme-ov-file#dependencies>`_))
9
+
10
+
.. code-block:: python
11
+
12
+
pip install -r requirements.txt
13
+
14
+
15
+
Model Zoo
16
+
------------------------------------
9
17
* :ref:`torch_compile_resnet`: Compiling a ResNet model using the Torch Compile Frontend for ``torch_tensorrt.compile``
10
18
* :ref:`torch_compile_transformer`: Compiling a Transformer model using ``torch.compile``
11
-
* :ref:`torch_compile_advanced_usage`: Advanced usage including making a custom backend to use directly with the ``torch.compile`` API
12
19
* :ref:`torch_compile_stable_diffusion`: Compiling a Stable Diffusion model using ``torch.compile``
13
-
* :ref:`torch_export_cudagraphs`: Using the Cudagraphs integration with `ir="dynamo"`
14
-
* :ref:`custom_kernel_plugins`: Creating a plugin to use a custom kernel inside TensorRT engines
15
-
* :ref:`refit_engine_example`: Refitting a compiled TensorRT Graph Module with updated weights
16
-
* :ref:`mutable_torchtrt_module_example`: Compile, use, and modify TensorRT Graph Module with MutableTorchTensorRTModule
17
-
* :ref:`vgg16_fp8_ptq`: Compiling a VGG16 model with FP8 and PTQ using ``torch.compile``
18
-
* :ref:`engine_caching_example`: Utilizing engine caching to speed up compilation times
19
-
* :ref:`engine_caching_bert_example`: Demonstrating engine caching on BERT
20
+
* :ref:`_torch_export_gpt2`: Compiling a GPT2 model using AOT workflow (`ir=dynamo`)
21
+
* :ref:`_torch_export_llama2`: Compiling a Llama2 model using AOT workflow (`ir=dynamo`)
This interactive script is intended as a sample of the Torch-TensorRT workflow with `torch.compile` on a Stable Diffusion model. A sample output is featured below:
# Pytorch model generated text: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my
84
+
# Pytorch model generated text: The parallel programming paradigm is a set of programming languages that are designed to be used in parallel. The main difference between parallel programming and parallel programming is that
85
+
85
86
# =============================
86
-
# TensorRT model generated text: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my
87
+
# TensorRT model generated text: The parallel programming paradigm is a set of programming languages that are designed to be used in parallel. The main difference between parallel programming and parallel programming is that
This interactive script is intended as a sample of the Torch-TensorRT workflow with dynamo backend on a Llama2 model."""
7
+
This script illustrates Torch-TensorRT workflow with dynamo backend on popular Llama2 model."""
8
8
9
9
# %%
10
10
# Imports and Model Definition
@@ -82,9 +82,11 @@
82
82
)[0],
83
83
)
84
84
85
-
# %%
86
-
# The output sentences should look like
85
+
86
+
# Prompt : What is dynamic programming?
87
+
87
88
# =============================
88
-
# Pytorch model generated text: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my
89
+
# Pytorch model generated text: Dynamic programming is an algorithmic technique used to solve complex problems by breaking them down into smaller subproblems, solving each subproblem only once, and
90
+
89
91
# =============================
90
-
# TensorRT model generated text: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my
92
+
# TensorRT model generated text: Dynamic programming is an algorithmic technique used to solve complex problems by breaking them down into smaller subproblems, solving each subproblem only once, and
0 commit comments