In this blog post I’m going to detail how deploy and configure a Nvidia GPU enabled Tanzu Kubernetes Grid cluster in AWS. The method will be similar for Azure, for vSphere there are a number of additional steps to prepare the system. I’m going to essentially follow the official documentation, then run some of the Nvidia tests. Like always, it’s good to get a visual reference and such for these kinds of deployments.
Pre-Reqs
- Nvidia today only support Ubuntu deployed images in relation to a TKG deployment
- For this blog I’ve already deployed my TKG Management cluster in AWS
Deploy a GPU enabled workload cluster
It’s simple, just deploy a workload cluster that for the compute plane nodes (workers) that uses a GPU enabled instance.
You can create a new cluster YAML file from scratch, or clone one of your existing located in:
~/.config/tanzu/tkg/clusterconfigs
Below are the four main values you will need to change. As mentioned above, you need a GPU enabled instance, and for the OS to be Ubuntu. The OS version will default if not set to 20.04.
CONTROL_PLANE_MACHINE_TYPE: t3.large NODE_MACHINE_TYPE: g4dn.xlarge OS_ARCH: amd64 OS_NAME: ubuntu OS_VERSION: "20.04
The rest of the file you configure as you would for any workload cluster deployment. Continue reading Deploying Nvidia GPU enabled Tanzu Kubernetes Clusters