<br />
<br />
系统背景
首先根据《在飞牛OS使用libnvidia-container让docker容器支持NVIDIA GPU加速》文章中
https://club.fnnas.com/forum.php?mod=viewthread&tid=14106描述的方式成功运行nvidia-container,如果没有完成,请尝试按照教程中的方法执行命令,有一些问题,会在后面提到。
你可能遇到的问题
-
安装nvidia-container-toolkit时,出现了这样类似的输出
ldconfig: /lib/x86_64-linux-gnu/libnvidia-fbc.so.1 is not a symbolic link
ldconfig: /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 is not a symbolic link
ldconfig: /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 is not a symbolic link
ldconfig: /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1 is not a symbolic link
先不要慌,请把这些打印出来的信息复制到一个文本文件中
-
运行命令
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
时,会出现错误,类似这样的错误
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
还有这样的错误
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so: permission denied: unknown.
你需要做的事情
-
在应用商店中卸载原来安装的驱动,同时进入文件管理页面--》管理员视角,找到@app开头的文件夹,每一个都进入检查以下,删除Nvidia-Driver文件夹
-
重新安装应用商店的驱动
-
将之前保存的文本中,所有出现过的每一行路径,类似/lib/x86_64-linux-gnu/libnvidia-fbc.so.1,都执行一遍
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-fbc.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-fbc.so.1
记得替换文件名称,以下是我自己的,可能和你有所不一样,你可以对比一下再执行
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-fbc.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-fbc.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.560.28.03 /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-nvvm.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-ngx.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-ngx.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libOpenCL.so.560.28.03 /lib/x86_64-linux-gnu/libOpenCL.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-cfg.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-cfg.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvcuvid.so.560.28.03 /lib/x86_64-linux-gnu/libnvcuvid.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-vksc-core.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-vksc-core.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libcudadebugger.so.560.28.03 /lib/x86_64-linux-gnu/libcudadebugger.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-allocator.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-allocator.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libGLESv2_nvidia.so.560.28.03 /lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2
sudo ln -sf /lib/x86_64-linux-gnu/libnvidia-encode.so.560.28.03 /lib/x86_64-linux-gnu/libnvidia-encode.so.1
sudo ln -sf /lib/x86_64-linux-gnu/libnvoptix.so.560.28.03 /lib/x86_64-linux-gnu/libnvoptix.so.1
-
针对open failed: /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so: permission denied这种错误,执行
sudo chown root:root /usr/lib/x86\_64-linux-gnu/libvdpau\_nvidia.so
然后再次执行
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
直到不再出现错误;有其他问题,下面留言,我能解决的,就帮忙解决
|