Skip to content

Commit 2a115da

Browse files
Shixiaowei02DreamGenXAce-RRbprusjanpetrov
authoredJun 18, 2024
Update TensorRT-LLM (NVIDIA#1793)
Co-authored-by: DreamGenX <x@dreamgen.com> Co-authored-by: Ace-RR <78812427+Ace-RR@users.noreply.github.com> Co-authored-by: bprus <39293131+bprus@users.noreply.github.com> Co-authored-by: janpetrov <janpetrov@icloud.com>
1 parent db4edea commit 2a115da

File tree

318 files changed

+8621
-4763
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

318 files changed

+8621
-4763
lines changed
 

‎benchmarks/cpp/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ ${HOME}/.local/bin/trtllm-build \
232232
--output_dir ${LORA_ENGINE} \
233233
--max_batch_size ${MAX_BATCH} \
234234
--max_input_len $MAX_LEN \
235-
--max_output_len $MAX_LEN \
235+
--max_seq_len $((2*${MAX_LEN})) \
236236
--gemm_plugin float16 \
237237
--lora_plugin float16 \
238238
--use_paged_context_fmha enable \

‎benchmarks/cpp/bertBenchmark.cpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include "tensorrt_llm/common/memoryUtils.h"
1818
#include "tensorrt_llm/plugins/api/tllmPlugin.h"
1919
#include "tensorrt_llm/runtime/iTensor.h"
20+
#include "tensorrt_llm/runtime/rawEngine.h"
2021
#include "tensorrt_llm/runtime/tllmLogger.h"
2122
#include "tensorrt_llm/runtime/tllmRuntime.h"
2223
#include "tensorrt_llm/runtime/worldConfig.h"
@@ -78,11 +79,10 @@ void benchmarkBert(std::string const& modelName, std::filesystem::path const& da
7879
{
7980
auto const worldConfig = WorldConfig::mpi();
8081
auto const enginePath = dataPath / engineFilename(dataPath, worldConfig, modelName);
81-
auto engineBlob = loadEngine(enginePath.string());
8282

8383
for (float gpuWeightsPercent : gpuWeightsPercents)
8484
{
85-
auto rt = std::make_shared<TllmRuntime>(engineBlob.data(), engineBlob.size(), gpuWeightsPercent, *logger);
85+
auto rt = std::make_shared<TllmRuntime>(RawEngine(enginePath), logger.get(), gpuWeightsPercent);
8686
rt->addContext(0);
8787
for (auto inLen : inLens)
8888
{

0 commit comments

Comments
 (0)
Please sign in to comment.