Setting up a Kubernetes cluster with an NVIDIA GPU

by Ronald van Bekkum  on May 10, 2023

Why

For transcoding videos, a Graphical Processing Unit (GPU) may be used. The reason for this is that anyone who has ever played with FFmpeg, for example, knows that transcoding on a CPU will require many resources. To speed up the conversion you can use a GPU. Making use of a GPU in a server is just a matter of getting a GPU enabled server. You can run (many) processes using the GPU, but there may also be other dependencies; for example, maybe you want to save the generated file in an online storage on a specific path, or you want to do some fancy logo insertion, determine closed captions, or anything else you can think of. To isolate this logic and have all your dependencies contained, you can use a (docker) container. Enabling a GPU on a container is not very difficult, just use the correct base image and you’re good to go. However, how do we then scale our logic?  We want to be able to run multi containers next to each other and can use the GPU on every single instance (pod) we have. Our choice was to use Kubernetes (k8s) for this.

What we did

To set up a cluster using a GPU, we first set up a node that had a GPU and a k8s cluster. You can use a local machine and do a set up or you can set up a cluster in one of the many cloud services that provide this, but make sure you have a GPU enabled node for the setup.

Once we completed the initial set up, we created a namespace for the GPU.  We chose to just name it after the vendor of the GPU in our machine: ‘nvidia’:

kubectl create namespace nvidia

Next, we installed the GPU-Operator of NVIDIA using helm.  Make sure that you install the operator into the same namespace as you created before:

helm install –wait –generate-name -n nvidia –create-namespace nvidia/gpu-operator

After installing the operator, several pods will start spinning up to set up the GPU sharing but, we needed some more configuration to set up the slicing of the GPU correctly. 

Creating the configuration file

We’ve set up a simple ‘time-slicing-configuration’ for the GPU sharing between the pods:

apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 8

The ‘replicas’ mean that the node will ‘inform’ the k8s cluster that it has 8 GPUs available. If you have multiple nodes, it will multiply the number.

Load the configuration

The following step was loading the configuration into the cluster. Make sure you use the same namespace as the GPU-operator is using (nvidia):

kubectl create -n nvidia -f time-slicing-config.yaml

Patch the cluster configuration

After this we’ve patched our cluster policy to use the time-slicing configuration.  As before, make sure the namespaces match nicely:

kubectl patch clusterpolicy/cluster-policy -n nvidia --type merge -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "tesla-t4"}}}}'

As you can see, we’ve used a tesla-t4 card, which is what we wanted as our default GPU.
After patching the cluster policy, the GPU-operator will restart the NVIDIA pods and the configuration will be applied correctly.

Verifying

We’ve then verified that the node is sharing it’s GPU resource as configured:

kubectl describe node <my-lovely-node>

The output can vary a bit, but based on your cluster it should show you how many GPU’s are available:

Capacity:
  nvidia.com/gpu.shared: 16

After this, the pods are ready to request a GPU! (For example, you can use a docker-image of the NVIDIA github as a basis and run nvidia-smi to see what happens).


References:

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
https://github.com/NVIDIA/k8s-device-plugin