"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808

eurus-ch · 2024-01-04T06:10:07Z

Hi,

while trying to run this

python build.py --model_dir $model_dir$ \
                --dtype float16 \
                --use_gpt_attentionZ_plugin float16 \
                --use_gemm_plugin float16 \
                --max_batch_size 4 \
                --max_input_len 128 \
                --max_output_len 128

we run into this FATAL ERROR, a strange undefined symbol

Traceback (most recent call last):  
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 56, in _init 	torch.classes.load_library(ft_decoder_lib)  
File "/usr/local/lib/python3.10/dist-packages/torch/_classes.py", line 51, in load_library  	torch.ops.load_library(path)  
File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 841, in load_library
	ctypes.CDLL(path)  
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
	self._handle = _dlopen(self._name, mode)OSError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_
	
During handling of the above exception, another exception occurred:Traceback (most recent call last):  

File "TensorRT-LLM/examples/llama/build.py", line 33, in <module>
	from weight import (get_scaling_factors, load_from_awq_llama, load_from_binary,  
File "TensorRT-LLM/examples/llama/weight.py", line 24, in <module>
	import tensorrt_llm  
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 61, in <module>    
	_init(log_level="error")  
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 59, in _init 	raise ImportError(str(e) + msg)
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

Before that, we built wheel through

python3 ./scripts/build_wheel.py --clean  --trt_root /usr/local/tensorrt

And the software versions are

tensorboard               2.9.0
tensorboard-data-server   0.6.1
tensorboard-plugin-wit    1.8.1
tensorrt                  9.2.0.post12.dev5
tensorrt-llm              0.7.1
torch-tensorrt            0.0.0
pytorch-quantization      2.1.2
torch                     2.1.0a0+32f93b1
torch-tensorrt            0.0.0
torchdata                 0.7.0a0
torchtext                 0.16.0a0
torchvision               0.16.0a0

Have you got any clue on solving this? Much thanks!

The text was updated successfully, but these errors were encountered:

Shixiaowei02 · 2024-01-04T06:34:57Z

Please ensure that you build and run TensorRT-LLM in the same environment. Alternatively, you can try building TensorRT-LLM in a Docker container by executing this command:

make -C docker release_build

Thank you!

eurus-ch · 2024-01-04T08:37:02Z

Using tensorrt-llm 0.6.1, and the error changes into this

Traceback (most recent call last):  
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer    
	_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__    
	func = self.__getitem__(name)  
File "/usr/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__    
	func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMemoryInfo_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):  
File "/TensorRT-LLM/./examples/llama/build.py", line 906, in <module>    
	build(0, args)  
File "/TensorRT-LLM/./examples/llama/build.py", line 850, in build    
	engine = build_rank_engine(builder, builder_config, engine_name,  
File "/TensorRT-LLM/./examples/llama/build.py", line 609, in build_rank_engine    
	profiler.print_memory_usage(f'Rank {rank} Engine build starts')  
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 197, in print_memory_usage    
	alloc_device_mem, _, _ = device_memory_info(device=device)  
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 148, in device_memory_info    
	mem_info = _device_get_memory_info_fn(handle)  
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 2438, in nvmlDeviceGetMemoryInfo    
	fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2") 
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer    
	raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

Thank you, but I'm developing in a Docker and building another Docker within seems restrained so...

woskii · 2024-01-08T09:00:59Z

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

===================================================================
I solved this error by manually installing pytorch 2.1.0
Command like this:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

ekagra-ranjan · 2024-01-11T20:47:45Z

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

I too faced this issue. This was the fix: NVIDIA/k8s-device-plugin#331 (comment)

dongteng · 2024-02-02T01:40:10Z

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6 FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

=================================================================== I solved this error by manually installing pytorch 2.1.0 Command like this: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

Thanks

AbhisKmr · 2024-04-01T21:26:01Z

Im still facing the same issue

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6 FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

=================================================================== I solved this error by manually installing pytorch 2.1.0 Command like this: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

Im still getting the same error my env configs are
attrs 23.2.0
av 10.0.0
bcrypt 4.1.2
braceexpand 0.1.7
certifi 2020.6.20
cffi 1.16.0
chardet 4.0.0
charset-normalizer 3.3.2
coloredlogs 15.0.1
cryptography 42.0.5
ctranslate2 3.24.0
dbus-python 1.2.16
distro 1.9.0
distro-info 1.0+deb11u1
docker 7.0.0
docker-compose 1.29.2
dockerpty 0.4.1
docopt 0.6.2
einops 0.7.0
encodec 0.1.1
fastcore 1.5.29
faster-whisper 0.9.0
fastprogress 1.0.3
ffmpeg-python 0.2.0
filelock 3.13.3
flatbuffers 24.3.25
fsspec 2024.3.1
future 1.0.0
httplib2 0.18.1
huggingface-hub 0.17.3
humanfriendly 10.0
HyperPyYAML 1.2.2
idna 2.10
Jinja2 3.1.3
joblib 1.3.2
jsonschema 3.2.0
kaldialign 0.9.1
llvmlite 0.42.0
MarkupSafe 2.1.5
more-itertools 10.2.0
mpmath 1.3.0
networkx 3.2.1
numba 0.59.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
nvidia-pyindex 1.0.9
onnxruntime 1.16.0
openai-whisper 20231117
packaging 24.0
paramiko 3.4.0
pillow 10.2.0
pip 20.3.4
protobuf 5.26.1
pycparser 2.22
pycurl 7.43.0.6
PyGObject 3.38.0
PyNaCl 1.5.0
pyrsistent 0.20.0
PySimpleSOAP 1.16.2
python-apt 2.2.1
python-debian 0.1.39
python-debianbts 3.1.0
python-dotenv 0.21.1
python-snappy 0.5.3
PyYAML 5.4.1
regex 2023.12.25
reportbug 7.10.3+deb11u1
requests 2.31.0
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.8
scipy 1.12.0
sentencepiece 0.2.0
setuptools 52.0.0
six 1.16.0
soundfile 0.12.1
speechbrain 0.5.16
sympy 1.12
tensorrt 8.6.1.post1
tensorrt-bindings 8.6.1
tensorrt-libs 8.6.1
texttable 1.7.0
tiktoken 0.3.3
tokenizers 0.14.1
torch 2.1.0+cu121
torchaudio 2.1.0+cu121
torchvision 0.16.0+cu121
tqdm 4.66.2
triton 2.1.0
typing-extensions 4.10.0
unattended-upgrades 0.1
urllib3 1.26.5
vocos 0.1.0
websocket-client 0.59.0
websockets 12.0
wheel 0.34.2
WhisperSpeech 0.8

development hardware: google cloud

Error message::

FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 58, in _init
    torch.classes.load_library(ft_decoder_lib)
  File "/usr/local/lib/python3.10/dist-packages/torch/_classes.py", line 51, in load_library
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/WhisperFusion/main.py", line 11, in <module>
    from whisper_live.trt_server import TranscriptionServer
  File "/root/WhisperFusion/whisper_live/trt_server.py", line 17, in <module>
    from whisper_live.trt_transcriber import WhisperTRTLLM
  File "/root/WhisperFusion/whisper_live/trt_transcriber.py", line 16, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 64, in <module>
    _init(log_level="error")
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 61, in _init
    raise ImportError(str(e) + msg)
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.

- There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526. Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526. Mentioned in the issue as well: NVIDIA#808 (comment)

nv-guomingz · 2024-11-15T17:20:33Z

please feel free to reopen this ticket if needed.

DeekshithaDPrakash · 2024-11-25T06:43:37Z

+------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model            | Version | Status                                                                                                                                                                                                     |
+------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing   | 1       | READY                                                                                                                                                                                                      |
| preprocessing    | 1       | READY                                                                                                                                                                                                      |
| tensorrt_llm     | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm_common.so: undefined symbol: _ZNK12tensorrt_llm8executor8Response11getErrorMsgB5cxx11E |
|                  |         | v                                                                                                                                                                                                          |
| tensorrt_llm_bls | 1       | READY

same issue when I execute launch_triton_server.py file

Shixiaowei02 self-assigned this Jan 4, 2024

Shixiaowei02 added the triaged Issue has been triaged by maintainers label Jan 4, 2024

This was referenced Feb 8, 2024

Pin torch version collabora/WhisperFusion#34

Closed

Pin the pytorch versions to be compatible with TensorRT-LLM collabora/WhisperFusion#35

Merged

praveenc mentioned this issue Feb 13, 2024

Unable to convert Llama-2-7b-chat-hf model to TensorRT-LLM engine #1079

Closed

4 tasks

CoderHam mentioned this issue May 3, 2024

[fix] export failure with CUDA driver < 526 and pynvml>=11.5.0 #1537

Closed

nv-guomingz closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808

eurus-ch commented Jan 4, 2024

Shixiaowei02 commented Jan 4, 2024 •

edited

Loading

eurus-ch commented Jan 4, 2024

woskii commented Jan 8, 2024

ekagra-ranjan commented Jan 11, 2024 •

edited

Loading

dongteng commented Feb 2, 2024

AbhisKmr commented Apr 1, 2024 •

edited

Loading

nv-guomingz commented Nov 15, 2024

DeekshithaDPrakash commented Nov 25, 2024 •

edited

Loading

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808

"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808

Comments

eurus-ch commented Jan 4, 2024

Shixiaowei02 commented Jan 4, 2024 • edited Loading

eurus-ch commented Jan 4, 2024

woskii commented Jan 8, 2024

ekagra-ranjan commented Jan 11, 2024 • edited Loading

dongteng commented Feb 2, 2024

AbhisKmr commented Apr 1, 2024 • edited Loading

nv-guomingz commented Nov 15, 2024

DeekshithaDPrakash commented Nov 25, 2024 • edited Loading

Shixiaowei02 commented Jan 4, 2024 •

edited

Loading

ekagra-ranjan commented Jan 11, 2024 •

edited

Loading

AbhisKmr commented Apr 1, 2024 •

edited

Loading

DeekshithaDPrakash commented Nov 25, 2024 •

edited

Loading