centos 无法打开某些GPU库

oxiaedzo  于 2022-11-07  发布在  其他
关注(0)|答案(1)|浏览(115)

我应该如何在CentOS 7中修复此问题?

[jalal@goku ~]$ pip freeze | grep tensorflow
tensorflow-estimator==2.2.0
tensorflow-gpu==2.2.0
[jalal@goku ~]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-07 23:50:07.811271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-07 23:50:07.867796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-07 23:50:07.869403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-07 23:50:07.870136: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:
2021-06-07 23:50:07.874249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-07 23:50:07.877819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-07 23:50:07.878745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-07 23:50:07.882687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-07 23:50:07.884788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-07 23:50:07.890952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-07 23:50:07.891011: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available:  0

尽管有两个GPU:

[jalal@goku ~]$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.9.2009 (Core)
Release:    7.9.2009
Codename:   Core

另外,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

我按照www.example.com的建议尝试了以下方法https://github.com/tensorflow/tensorflow/issues/38194#issuecomment-629801937,但没有成功:

[jalal@goku djrn]$ ls /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
lrwxrwxrwx. 1 root root 20 Sep 21  2020 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 -> libcudart.so.10.2.89
[jalal@goku djrn]$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1
[sudo] password for jalal: 
ln: failed to create symbolic link ‘/usr/lib/x86_64-linux-gnu/libcudart.so.10.1’: No such file or directory

因为:

ls: cannot access /usr/lib/x86_64-linux-gnu: No such file or directory

具体地说,我需要tensforflow来支持CUDA 10.2,我可以使用任何版本的tensorflow(首选项是tensorflow 2+),但无法找到支持CUDA 10.2的版本。https://www.tensorflow.org/install/source#tested_build_configurations
同样,基于此,我的CUDA版本是10.2,它与nvidia-sminvcc --version版本都不同:

$ stat /usr/local/cuda
  File: ‘/usr/local/cuda’ -> ‘/usr/local/cuda-10.2’
  Size: 20          Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 67157410    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:usr_t:s0
Access: 2021-05-20 10:43:06.864530636 -0400
Modify: 2020-09-21 09:39:18.559883390 -0400
Change: 2020-09-21 09:39:18.559883390 -0400
 Birth: -

P.S.:我已经使用python venv命令创建了我的虚拟环境,不想使用condapyenv
附言:我做了这个软链接,仍然不工作:

(djrn) [jalal@goku djrn]$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/libcudart.so.10.1
[sudo] password for jalal: 
(djrn) [jalal@goku djrn]$ ls /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
lrwxrwxrwx. 1 root root 20 Sep 21  2020 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 -> libcudart.so.10.2.89
(djrn) [jalal@goku djrn]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-08 01:40:39.152040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-08 01:40:39.401399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:40:39.403106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:40:39.403438: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:
2021-06-08 01:40:39.406985: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-08 01:40:39.410320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-08 01:40:39.410912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-08 01:40:39.414628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-08 01:40:39.416297: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-08 01:40:39.422208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-08 01:40:39.422260: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available:  0
>>>
bfrts1fy

bfrts1fy1#

感谢jonno_FTW

$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1
$ export LD_LIBRARY_PATH=/usr/lib

解决了问题。现在我看到以下输出:

(djrn) [jalal@goku djrn]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-08 01:45:59.138197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-08 01:45:59.191833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:45:59.193773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:45:59.194216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-06-08 01:45:59.197372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-08 01:45:59.200555: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-08 01:45:59.201078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-08 01:45:59.204664: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-08 01:45:59.206295: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-08 01:45:59.212072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-08 01:45:59.217509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
Num GPUs Available:  2

相关问题