Open
Description
Setup:
With custom change of GPU Operator
NVIDIA/gpu-operator@master...Dragoncell:gpu-operator:master-gke
Using below command to install the GPU Operator using CDI enabled with COS installed GPU driver
helm upgrade -i --create-namespace --namespace gpu-operator noperator deployments/gpu-operator --set driver.enabled=false --set cdi.enabled=true --set cdi.default=true --set operator.runtimeClass=nvidia-cdi --set hostRoot=/ --set driverRoot=/home/kubernetes/bin/nvidia --set devRoot=/ --set operator.repository=gcr.io/jiamingxu-gke-dev --set operator.version=v0422_04 --set toolkit.installDir=/home/kubernetes/bin/nvidia --set toolkit.repository=gcr.io/jiamingxu-gke-dev --set toolkit.version=v4 --set validator.repository=gcr.io/jiamingxu-gke-dev --set validator.version=v0417_1 --set devicePlugin.version=v0422_4 --set devicePlugin.repository=gcr.io/jiamingxu-gke-dev
During the CDI creation either in toolkit container for management cdi spec, or in k8s device plugin for workload cdi spec, there are a few warning level logs.
Both:
- Could not find ld.so.cache
time="2024-04-22T19:37:03Z" level=warning msg="Could not find ld.so.cache at /host/home/kubernetes/bin/nvidia/etc/ld.so.cache; creating empty cache"
time="2024-04-22T19:37:03Z" level=info msg="Using driver version 535.129.03"
time="2024-04-22T19:37:03Z" level=warning msg="Could not find ld.so.cache at /host/home/kubernetes/bin/nvidia/etc/ld.so.cache; creating empty cache"
- Feature related stuff
time="2024-04-22T19:37:03Z" level=warning msg="Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found"
time="2024-04-22T19:37:03Z" level=warning msg="Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found"
time="2024-04-22T19:37:03Z" level=warning msg="Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found"
time="2024-04-22T19:37:03Z" level=warning msg="Could not locate nvidia/535.129.03/gsp*.bin: pattern nvidia/535.129.03/gsp*.bin not found"
k8s device plugin only
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate glvnd/egl_vendor.d/10_nvidia.json: pattern glvnd/egl_vendor.d/10_nvidia.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate vulkan/icd.d/nvidia_icd.json: pattern vulkan/icd.d/nvidia_icd.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate vulkan/icd.d/nvidia_layers.json: pattern vulkan/icd.d/nvidia_layers.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate vulkan/implicit_layer.d/nvidia_layers.json: pattern vulkan/implicit_layer.d/nvidia_layers.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate egl/egl_external_platform.d/15_nvidia_gbm.json: pattern egl/egl_external_platform.d/15_nvidia_gbm.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate egl/egl_external_platform.d/10_nvidia_wayland.json: pattern egl/egl_external_platform.d/10_nvidia_wayland.json not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate nvidia/nvoptix.bin: pattern nvidia/nvoptix.bin not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate nvidia/xorg/libglxserver_nvidia.so.535.129.03: pattern nvidia/xorg/libglxserver_nvidia.so.535.129.03 not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found"
....
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found"
time="2024-04-22T19:37:22Z" level=warning msg="Could not locate nvidia/xorg/libglxserver_nvidia.so.535.129.03: pattern nvidia/xorg/libglxserver_nvidia.so.535.129.03 not found"
Wondering is there any warning worth further investigation ? For example vulkan/icd.d/nvidia_icd.json
, it is actually under like
/home/kubernetes/bin/nvidia/vulkan/icd.d $ ls
nvidia_icd.json
Metadata
Metadata
Assignees
Labels
No labels