-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Import Error: /libs/libth_common.so: Undefined Symbol" While Building #808
Comments
Please ensure that you build and run TensorRT-LLM in the same environment. Alternatively, you can try building TensorRT-LLM in a Docker container by executing this command:
Thank you! |
Using tensorrt-llm 0.6.1, and the error changes into this Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
func = self.__getitem__(name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMemoryInfo_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/TensorRT-LLM/./examples/llama/build.py", line 906, in <module>
build(0, args)
File "/TensorRT-LLM/./examples/llama/build.py", line 850, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/TensorRT-LLM/./examples/llama/build.py", line 609, in build_rank_engine
profiler.print_memory_usage(f'Rank {rank} Engine build starts')
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 197, in print_memory_usage
alloc_device_mem, _, _ = device_memory_info(device=device)
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 148, in device_memory_info
mem_info = _device_get_memory_info_fn(handle)
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 2438, in nvmlDeviceGetMemoryInfo
fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2")
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found Thank you, but I'm developing in a Docker and building another Docker within seems restrained so... |
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6 =================================================================== |
I too faced this issue. This was the fix: NVIDIA/k8s-device-plugin#331 (comment) |
Thanks |
Im still facing the same issue
Im still getting the same error my env configs are development hardware: google cloud Error message::
|
- There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526. Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526. Mentioned in the issue as well: NVIDIA#808 (comment)
please feel free to reopen this ticket if needed. |
same issue when I execute launch_triton_server.py file |
Hi,
while trying to run this
we run into this FATAL ERROR, a strange undefined symbol
Before that, we built wheel through
And the software versions are
Have you got any clue on solving this? Much thanks!
The text was updated successfully, but these errors were encountered: