After the debacle with Rocky linux and Red Hat, I decided to move to a more future proof setup with my kubernetes cluster fully running on Debian. I chose Debian 12 at the time but unfortunately Nvidia did not yet support Debian 12 in their CUDA repository. Using the CUDA repository is the easiest way to install the Nvidia drivers and much better than running a script. See package manager installation on this Nvidia page. Therefore, I decided to use Debian 11 instead which was supported. That on its own gave some headaches in the setup and required some hacks to get it working. Also, looking at future migrations (considering e.g. replacing calico by cillium), it is a lot better to run the same OS on all kubenetes nodes.
However setup of Nvidia on Debian 12 turned out also not be that easy and I ran into some issues with the Nvidia driver provided by the cuda repository. Also, I found that in the intermediate steps some additional checks were missing leading to a late detection of problems.
The start for these instructions is a working cluster running kubernetes version 1.27.3 on Debian 12. The standard instructions on the kubernetes website work find for this. Also, my setup is based on running virtual machines using KVM on a linux host with PCI passthrough the Nvidia device. Instructions for that can be found here. There will be a single VM/kubernetes node that has the Nvidia GPU available. On that system, the Nvidia device should be recognized as shown by lspci
# lspci | grep -i nvidia 0a:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) 0b:00.0 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
In general, the setup consists of four steps:
- install the Nvidia driver on the host
- configure containerd to use Nvidia as the default runtime using the Nvidia Container Toolkit
- install and configure the Nvidia Device Plugin to mark nodes as being GPU nodes
- persistent configuration of the GPU
Install the nvidia driver
Here, there are two alternatives:
I decided to use the Nvidia repository since I was going to use other things from nvidia as well such as the Nvidia container toolkit, so getting both from the same supplier would give bigger chances of success, right? Unfortunately not, I finally got it working by using the slightly older driver from the Debian repository instead. The problem I had with the Nvidia repository was that nvidia-smi
worked fine, but machine learning programs running in a container (after configuring the container toolkit, next step) could not find the GPU.
The main difference was that nvidia-smi
showed N/A
for the CUDA Version
in the top right of the output. What you want to see is a CUDA version which shows the maximum version of CUDA supported by the driver. For example:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A | | 0% 30C P8 9W / 300W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Clearly without any compute capability you cannot use the GPU for computation.
Configure containerd
Containerd must be configured to use the Nvidia runtime. To do this, install the
container toolkit as described here. After following these instructions, I ran into stability issues with crashlooping containers on the GPU-enabled kubernetes node. To fix this, you should set SystemdCgroup = true
. Also, I used nvidia as the default runtime using default_runtime_name = "nvidia"
. The modified parts of the containerd config file /etc/containerd/contig.toml are:
[plugins] [plugins."io.containerd.grpc.v1.cri"] [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] SystemdCgroup = true
(indeed, the containerd config file is horrible, but requires no further configuration after it is setup). Note that is is also possible to not use nvidia as the default runtime, but that requires more configuration when deploying pods and pods that use the nvidia runtime but don’t use nvidia don’t appear to be in the way of pods that do use nvidia.
After installation of the Nvidia Container Toolkit, check that the GPU is used as follows:
# pull an example image ctr i pull nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 # run nvidia-smi in it ctr run --rm --gpus 0 -t nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 n nvidia-smi # run an actual job that uses the GPU. ctr run --rm --gpus 0 -t nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 n
The output of the last job should be:
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
Install and configure the nvidia device plugin
The Nvidia Device Plugin can be installed using the instructions here.
After installation of the plugin, verify that has found the GPU on the relevant nodes by looking at the output of pods in the nvidia-device-plugin-daemonset
DaemonSet. In some case, you might need to do a rollout restart of the daemonset in case problems occur, e.g. after upgrading kubernetes.
In my setup, I decided to use time slicing of the GPU with at most 5 concurrent tasks. This configuration option provides the most memory to individual jobs and allows things to at least run instead of fail if multiple jobs are running. This requires a time slicing configuration in time-slicing-config.yaml
:
version: v1 flags: migStrategy: none sharing: timeSlicing: renameByDefault: false failRequestsGreaterThanOne: false resources: - name: nvidia.com/gpu replicas: 5
This configuration is then passed to the Nvidia Device Plugin as a command-line parameter that refers to this config mounted as a config map using a patch nvidia-device-plugin-patch.yaml
:
apiVersion: apps/v1 kind: DaemonSet metadata: name: nvidia-device-plugin-daemonset spec: template: spec: containers: - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.0 name: nvidia-device-plugin-ctr env: - name: CONFIG_FILE value: /etc/wamblee/time-slicing-config.yaml volumeMounts: - name: time-slicing-config mountPath: /etc/wamblee volumes: - name: time-slicing-config configMap: name: nvidia-time-slicing-config
This together with the downloaded nvidia-device-config.yaml
is then applied to the kubernetes cluster using the following kustomization.yaml
:
kind: Kustomization namespace: kube-system generatorOptions: disableNameSuffixHash: true configMapGenerator: - name: nvidia-time-slicing-config files: - time-slicing-config.yaml resources: - nvidia-device-plugin.yaml patches: - target: group: apps version: v1 kind: DaemonSet name: nvidia-device-plugin-daemonset path: nvidia-device-plugin-patch.yaml
Finally, test the setup by running a GPU pod such as this one:
apiVersion: v1 kind: Pod metadata: name: gpu-pod namespace: default spec: restartPolicy: Never containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU
To test the concurrent execution of pods using the GPU and verify the concurrent limit of 5 that was defined above, run this job:
kind: Job metadata: creationTimestamp: null name: gpu-job namespace: default spec: completions: 10 parallelism: 10 template: spec: containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 resources: limits: nvidia.com/gpu: 1 restartPolicy: Never
This job runs the same task 10 times. You should see 5 pods running at the same time when running this on the cluster.
Persistent configuration of the GPU
In my setup, the server gets too hot when running the GPU at full power. However, I found out that using only 300W instead of 370W avoids overheating and does not lead to lower performance in ML jobs. I do this by placing a cron job in a /etc/cron.d/nvidia-pl
file that configures the power to be set at every reboot:
@reboot root nvidia-smi -pl 300
Final thoughts
Every time I need to configure something with a GPU on kubernetes I am always hoping that it will go fast but usually it is a huge problem. The whole setup, which is in fact easy should have been a piece of cake. However, the lack of good troubleshooting advice such as the nvidia-smi check with emphasis on CUDA version and the missing checks after configuration of the nvidia container toolkit caused a lot of head aches. Of course, the setup is quite complex if you look at it in total:
- a physical host has the GPU but is configured to ignored it and it through to a VM
- on the VM a container runtime must be modified so that it makes the nvidia driver available to processes in the container
- a device plugin on kubernetes marks nodes for having GPUs and make sure workloads requesting an NVIDIA GPU are scheduled correctly.