Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TensorRT-LLM Release 0.13.0
Key Features and Enhancements
docs/source/speculative_decoding.md
.ModelWeightsLoader
(a unified checkpoint converter, seedocs/source/architecture/model-weights-loader.md
).*.bin
and*.pth
.LLM
class.trust_remote_code
for customized models and tokenizers downloaded from Hugging Face Hub.curand
andbfloat16
support forReDrafter
.ModelRunnerCpp
class.head_size=48
cases for FMHA kernels.examples/dit/README.md
.executor
API.API Changes
use_fused_mlp
toTrue
by default.multi_block_mode
by default.strongly_typed
by default inbuilder
API.maxNewTokens
,randomSeed
andminLength
tomaxTokens
,seed
andminTokens
following OpenAI style.LLM
classLLM.generate
arguments to includePromptInputs
andtqdm
.executor
APILogitsPostProcessorConfig
.FinishReason
toResult
.Model Updates
examples/gemma/README.md
.Fixed Issues
remove_input_padding
is enabled #1999)smoothquant
. (convert qwen2-0.5b-instruct failed when using smoothquant #2087)exclude_modules
pattern inconvert_utils.py
to the changes inquantize.py
. ([Fix] Match exclude_modules pattern in convert_utils.py to quantize.py changes. #2113)FORCE_NCCL_ALL_REDUCE_STRATEGY
is set.gpt_attention
.LoraConfig
. (in python 3.11 and release 0.8.0:ValueError: mutable default <class 'datasets.utils.version.Version'> for field version is not allowed: use default_factory
#1323)Infrastructure Changes
nvcr.io/nvidia/pytorch:24.07-py3
.nvcr.io/nvidia/tritonserver:24.07-py3
.