{"id":2882,"date":"2023-12-25T20:24:25","date_gmt":"2023-12-25T20:24:25","guid":{"rendered":"https:\/\/brakkee.org\/site\/?p=2882"},"modified":"2025-04-07T14:18:59","modified_gmt":"2025-04-07T14:18:59","slug":"the-nvidia-device-plugin-on-debian-12","status":"publish","type":"post","link":"https:\/\/brakkee.org\/site\/2023\/12\/25\/the-nvidia-device-plugin-on-debian-12\/","title":{"rendered":"The Nvidia device plugin on Debian 12"},"content":{"rendered":"<p>After the debacle with Rocky linux and Red Hat, I decided to move to a more future proof setup with my kubernetes cluster fully running on Debian. I chose Debian 12 at the time but unfortunately Nvidia did not yet support Debian 12 in their <a href=\"https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/debian12\/x86_64\/\">CUDA repository<\/a>. Using the CUDA repository is the easiest way to install the Nvidia drivers and much better than running a script. See package manager installation on this <a href=\"https:\/\/docs.nvidia.com\/cuda\/cuda-installation-guide-linux\/index.html\">Nvidia page<\/a>. Therefore, I decided to use Debian 11 instead which was supported. That on its own gave some headaches in the setup and required some hacks to get it working. Also, looking at future migrations (considering e.g. replacing <a href=\"https:\/\/docs.tigera.io\/calico\/latest\/about\/\">calico<\/a> by <a href=\"https:\/\/cilium.io\/\">cilium<\/a>), it is a lot better to run the same OS on all kubenetes nodes.<\/p>\n<p><!--more--><\/p>\n<p>However setup of Nvidia on Debian 12 turned out also not be that easy and I ran into some issues with the Nvidia driver provided by the cuda repository. Also, I found that in the intermediate steps some additional checks were missing leading to a late detection of problems.<\/p>\n<p>The start for these instructions is a working cluster running kubernetes version 1.27.3 on Debian 12. The standard instructions on <a href=\"https:\/\/kubernetes.io\/docs\/setup\/production-environment\/tools\/kubeadm\/create-cluster-kubeadm\/\">the kubernetes website<\/a> work find for this. Also, my setup is based on running virtual machines using KVM on a linux host with PCI passthrough the Nvidia device. Instructions for that can be found <a href=\"https:\/\/www.server-world.info\/en\/note?os=CentOS_7&amp;p=kvm&amp;f=\">here<\/a>. There will be a single VM\/kubernetes node that has the Nvidia GPU available. On that system, the Nvidia device should be recognized as shown by <code>lspci<\/code><\/p>\n<pre># lspci | grep -i nvidia\r\n0a:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)\r\n0b:00.0 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)\r\n<\/pre>\n<p>In general, the setup consists of four steps:<\/p>\n<ul>\n<li>install the Nvidia driver on the host<\/li>\n<li>configure containerd to use Nvidia as the default runtime using the Nvidia Container Toolkit<\/li>\n<li>install and configure the Nvidia Device Plugin to mark nodes as being GPU nodes<\/li>\n<li>persistent configuration of the GPU<\/li>\n<\/ul>\n<h2>Install the nvidia driver<\/h2>\n<p>Here, there are two alternatives:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.nvidia.com\/cuda\/cuda-installation-guide-linux\/index.html#debian\">use the driver from the NVIDIA repository<\/a><\/li>\n<li><a href=\"https:\/\/wiki.debian.org\/NvidiaGraphicsDrivers#Debian_12_.22Bookworm.22\">use the driver from the Debian repository<\/a><\/li>\n<\/ul>\n<p>I decided to use the Nvidia repository since I was going to use other things from nvidia as well such as the Nvidia container toolkit, so getting both from the same supplier would give bigger chances of success, right? Unfortunately not, I finally got it working by using the slightly older driver from the Debian repository instead. The problem I had with the Nvidia repository was that <code>nvidia-smi<\/code> worked fine, but machine learning programs running in a container (after configuring the container toolkit, next step) could not find the GPU.<\/p>\n<p>The main difference was that <code>nvidia-smi<\/code> showed <code>N\/A<\/code> for the <code>CUDA Version<\/code> in the top right of the output. What you want to see is a CUDA version which shows the maximum version of CUDA supported by the driver. For example:<\/p>\n<pre>+-----------------------------------------------------------------------------+\r\n| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |\r\n|-------------------------------+----------------------+----------------------+\r\n| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\r\n| Fan  Temp  Perf  Pwr:Usage\/Cap|         Memory-Usage | GPU-Util  Compute M. |\r\n|                               |                      |               MIG M. |\r\n|===============================+======================+======================|\r\n|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0 Off |                  N\/A |\r\n|  0%   30C    P8     9W \/ 300W |      1MiB \/ 24576MiB |      0%      Default |\r\n|                               |                      |                  N\/A |\r\n+-------------------------------+----------------------+----------------------+\r\n                                                                               \r\n+-----------------------------------------------------------------------------+\r\n| Processes:                                                                  |\r\n|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\r\n|        ID   ID                                                   Usage      |\r\n|=============================================================================|\r\n|  No running processes found                                                 |\r\n+-----------------------------------------------------------------------------+\r\n\r\n<\/pre>\n<p>Clearly without any compute capability you cannot use the GPU for computation.<\/p>\n<h2>Configure containerd<\/h2>\n<p>Containerd must be configured to use the Nvidia runtime. To do this, install the<br \/>\ncontainer toolkit as described <a href=\"https:\/\/docs.nvidia.com\/datacenter\/cloud-native\/container-toolkit\/latest\/install-guide.html\">here<\/a>. After following these instructions, I ran into stability issues with crashlooping containers on the GPU-enabled kubernetes node. To fix this, you should set <code>SystemdCgroup = true<\/code>. Also, I used nvidia as the default runtime using <code>default_runtime_name = \"nvidia\"<\/code>. The modified parts of the containerd config file \/etc\/containerd\/contig.toml are:<\/p>\n<pre>[plugins]\r\n  [plugins.\"io.containerd.grpc.v1.cri\"]\r\n    [plugins.\"io.containerd.grpc.v1.cri\".containerd]\r\n      default_runtime_name = \"nvidia\"\r\n      [plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes]\r\n        [plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes.nvidia]\r\n          [plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes.nvidia.options]\r\n            SystemdCgroup = true\r\n<\/pre>\n<p>(indeed, the containerd config file is horrible, but requires no further configuration after it is setup). Note that is is also possible to not use nvidia as the default runtime, but that requires more configuration when deploying pods and pods that use the nvidia runtime but don&#8217;t use nvidia don&#8217;t appear to be in the way of pods that do use nvidia.<\/p>\n<p>After installation of the Nvidia Container Toolkit, check that the GPU is used as follows:<\/p>\n<pre># pull an example image\r\nctr i pull nvcr.io\/nvidia\/k8s\/cuda-sample:vectoradd-cuda10.2\r\n# run nvidia-smi in it \r\nctr run --rm --gpus 0 -t nvcr.io\/nvidia\/k8s\/cuda-sample:vectoradd-cuda10.2 n nvidia-smi\r\n# run an actual job that uses the GPU. \r\nctr run --rm --gpus 0 -t nvcr.io\/nvidia\/k8s\/cuda-sample:vectoradd-cuda10.2 n \r\n<\/pre>\n<p>The output of the last job should be:<\/p>\n<pre>[Vector addition of 50000 elements]\r\nCopy input data from the host memory to the CUDA device\r\nCUDA kernel launch with 196 blocks of 256 threads\r\nCopy output data from the CUDA device to the host memory\r\nTest PASSED\r\nDone\r\n\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h2>Install and configure the nvidia device plugin<\/h2>\n<p>The Nvidia Device Plugin can be installed using the instructions <a href=\"https:\/\/github.com\/NVIDIA\/k8s-device-plugin\">here<\/a>.<br \/>\nAfter installation of the plugin, verify that has found the GPU on the relevant nodes by looking at the output of pods in the <code>nvidia-device-plugin-daemonset<\/code> DaemonSet. In some case, you might need to do a rollout restart of the daemonset in case problems occur, e.g. after upgrading kubernetes.<\/p>\n<p>In my setup, I decided to use time slicing of the GPU with at most 5 concurrent tasks. This configuration option provides the most memory to individual jobs and allows things to at least run instead of fail if multiple jobs are running. This requires a time slicing configuration in <code>time-slicing-config.yaml<\/code>:<\/p>\n<pre>version: v1\r\nflags:\r\n  migStrategy: none\r\nsharing:\r\n  timeSlicing:\r\n    renameByDefault: false\r\n    failRequestsGreaterThanOne: false\r\n    resources:\r\n      - name: nvidia.com\/gpu\r\n        replicas: 5\r\n\r\n<\/pre>\n<p>This configuration is then passed to the Nvidia Device Plugin as a command-line parameter that refers to this config mounted as a config map using a patch <code>nvidia-device-plugin-patch.yaml<\/code>:<\/p>\n<pre>apiVersion: apps\/v1\r\nkind: DaemonSet\r\nmetadata:\r\n  name: nvidia-device-plugin-daemonset\r\nspec:\r\n  template:\r\n    spec:\r\n      containers:\r\n      - image: nvcr.io\/nvidia\/k8s-device-plugin:v0.14.0\r\n        name: nvidia-device-plugin-ctr\r\n        env:\r\n          - name: CONFIG_FILE\r\n            value: \/etc\/wamblee\/time-slicing-config.yaml \r\n        volumeMounts:\r\n        - name: time-slicing-config\r\n          mountPath: \/etc\/wamblee\r\n      volumes:\r\n      - name: time-slicing-config\r\n        configMap:\r\n          name: nvidia-time-slicing-config\r\n\r\n<\/pre>\n<p>This together with the downloaded <code>nvidia-device-config.yaml<\/code> is then applied to the kubernetes cluster using the following <code>kustomization.yaml<\/code>:<\/p>\n<pre>kind: Kustomization\r\n\r\nnamespace: kube-system \r\n\r\ngeneratorOptions:\r\n  disableNameSuffixHash: true\r\n\r\n\r\nconfigMapGenerator:\r\n  - name: nvidia-time-slicing-config\r\n    files: \r\n      - time-slicing-config.yaml\r\n\r\nresources:\r\n  - nvidia-device-plugin.yaml \r\n\r\npatches:\r\n  - target:\r\n      group: apps\r\n      version: v1\r\n      kind: DaemonSet\r\n      name: nvidia-device-plugin-daemonset\r\n    path: nvidia-device-plugin-patch.yaml \r\n\r\n<\/pre>\n<p>Finally, test the setup by running a GPU pod such as this one:<\/p>\n<pre>apiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: gpu-pod\r\n  namespace: default\r\nspec:\r\n  restartPolicy: Never\r\n  containers:\r\n    - name: cuda-container\r\n      image: nvcr.io\/nvidia\/k8s\/cuda-sample:vectoradd-cuda10.2\r\n      resources:\r\n        limits:\r\n          nvidia.com\/gpu: 1 # requesting 1 GPU\r\n<\/pre>\n<p>To test the concurrent execution of pods using the GPU and verify the concurrent limit of 5 that was defined above, run this job:<\/p>\n<pre>kind: Job\r\nmetadata:\r\n  creationTimestamp: null\r\n  name: gpu-job\r\n  namespace: default\r\nspec:\r\n  completions: 10 \r\n  parallelism: 10 \r\n  template:\r\n    spec:\r\n      containers: \r\n        - name: cuda-container\r\n          image: nvcr.io\/nvidia\/k8s\/cuda-sample:vectoradd-cuda10.2\r\n          resources:\r\n            limits:\r\n               nvidia.com\/gpu: 1    \r\n      restartPolicy: Never\r\n<\/pre>\n<p>This job runs the same task 10 times. You should see 5 pods running at the same time when running this on the cluster.<\/p>\n<h2>Persistent configuration of the GPU<\/h2>\n<p>In my setup, the server gets too hot when running the GPU at full power. However, I found out that using only 300W instead of 370W avoids overheating and does not lead to lower performance in ML jobs. I do this by placing a cron job in a <code>\/etc\/cron.d\/nvidia-pl<\/code> file that configures the power to be set at every reboot:<\/p>\n<pre>@reboot root nvidia-smi -pl 300\r\n<\/pre>\n<h2>Final thoughts<\/h2>\n<p>Every time I need to configure something with a GPU on kubernetes I am always hoping that it will go fast but usually it is a huge problem. The whole setup, which is in fact easy should have been a piece of cake. However, the lack of good troubleshooting advice such as the nvidia-smi check with emphasis on CUDA version and the missing checks after configuration of the nvidia container toolkit caused a lot of head aches. Of course, the setup is quite complex if you look at it in total:<\/p>\n<ul>\n<li>a physical host has the GPU but is configured to ignored it and it through to a VM<\/li>\n<li>on the VM a container runtime must be modified so that it makes the nvidia driver available to processes in the container<\/li>\n<li>a device plugin on kubernetes marks nodes for having GPUs and make sure workloads requesting an NVIDIA GPU are scheduled correctly.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>After the debacle with Rocky linux and Red Hat, I decided to move to a more future proof setup with my kubernetes cluster fully running on Debian. I chose Debian 12 at the time but unfortunately Nvidia did not yet &hellip; <a href=\"https:\/\/brakkee.org\/site\/2023\/12\/25\/the-nvidia-device-plugin-on-debian-12\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/posts\/2882"}],"collection":[{"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/comments?post=2882"}],"version-history":[{"count":26,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/posts\/2882\/revisions"}],"predecessor-version":[{"id":3008,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/posts\/2882\/revisions\/3008"}],"wp:attachment":[{"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/media?parent=2882"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/categories?post=2882"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brakkee.org\/site\/wp-json\/wp\/v2\/tags?post=2882"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}