In this blog post I’m going to deep dive into the end-to-end activation, deployment, and consuming of the managed Tanzu Services (Tanzu Kubernetes Grid Service > TKGS) within a VMware Cloud on AWS SDDC. I’ll deploy a Tanzu Cluster inside a vSphere Namespace, and then deploy my trusty Pac-Man application and make it Publicly Accessible.
Previously to this capability, you would need to deploy Tanzu Kubernetes Grid to VMC, which was fully supported, as a Management Cluster and then additional Tanzu Clusters for your workloads. (See Terminology explanations here). This was a fully support option, however it did not provide you all the integrated features you could have by using the TKGS as part of your On-Premises vSphere environment.
What is Tanzu Services on VMC?
Tanzu Kubernetes Grid Service is a managed service built into the VMware Cloud on AWS vSphere environment.
This feature brings the availability of the integrated Tanzu Kubernetes Grid Service inside of vSphere itself, by coupling the platform together, you can easily deploy new Tanzu clusters, use the administration and authentication of vCenter, as well as provide governance and policies from vCenter as well.
Note: VMware Cloud on AWS does not enable activation of Tanzu Kubernetes Grid by default. Contact your account team for more information.
Note2: In VMware Cloud on AWS, the Tanzu workload control plane can be activated only through the VMC Console.
But wait, couldn’t I already install a Tanzu Kubernetes Grid Cluster onto VMC anyway?
Tanzu Kubernetes Grid is a multi-cloud solution that deploys and manages Kubernetes clusters on your selected cloud provider. Previously to the vSphere integrated Tanzu offering for VMC that we are discussing today, you would deploy the general TKG option to your SDDC vCenter.
What differences should I know about this Tanzu Services offering in VMC versus the other Tanzu Kubernetes offering?
When Activated, Tanzu Kubernetes Grid for VMware Cloud on AWS is pre-provisioned with a VMC-specific content library that you cannot modify.
Tanzu Kubernetes Grid for VMware Cloud on AWS does not support vSphere Pods.
Creation of Tanzu Supervisor Namespace templates is not supported by VMware Cloud on AWS.
vSphere namespaces for Kubernetes releases are configured automatically during Tanzu Kubernetes Grid activation.
Activating Tanzu Kubernetes Grid Service in a VMC SDDC
Reminder: Tanzu Services Activation capabilities are not activated by default. Contact your account team for more information.
Within your VMC Console, you can either go via the Launchpad method or via the SDDC inventory item. I’ll cover both:
Click on Launchpad
Open the Kubernetes Tab
Click Learn More
Select the Journey Tab
Under Stage 2 – Activate > Click Get Started
Alternatively, from the SDDC object in the Inventory view
Click Actions
Click “Activate Tanzu Kubernetes Grid”
You will now be shown a status dialog, as VMC checks to ensure that Tanzu Kubernetes Grid Service can be activated in your cluster.
This will check you have the correct configurations and compute resources available.
In this blog post I’m going to detail how deploy and configure a Nvidia GPU enabled Tanzu Kubernetes Grid cluster in AWS. The method will be similar for Azure, for vSphere there are a number of additional steps to prepare the system. I’m going to essentially follow the official documentation, then run some of the Nvidia tests. Like always, it’s good to get a visual reference and such for these kinds of deployments.
For this blog I’ve already deployed my TKG Management cluster in AWS
Deploy a GPU enabled workload cluster
It’s simple, just deploy a workload cluster that for the compute plane nodes (workers) that uses a GPU enabled instance.
You can create a new cluster YAML file from scratch, or clone one of your existing located in:
~/.config/tanzu/tkg/clusterconfigs
Below are the four main values you will need to change. As mentioned above, you need a GPU enabled instance, and for the OS to be Ubuntu. The OS version will default if not set to 20.04.
By running a describe on the pod, I could see the failure message was due to the node condition DiskPressure.
If you need to clean up a high number of pods across namespaces in your environment, see this blog post.
kubectl describe pod {name}
I then looked at the node that the pod was scheduled too. (You can see this in the above screenshot, 4th line “node”).
Below we can see that on the node, Kubelet has tainted the node to stop further pods from being scheduled to this node.
In the events we see the message “Attempting to reclaim ephemeral-storage”
Configuring resources for Tanzu Kubernetes Grid nodes
First you will need to log into your Tanzu Kubernetes Grid Management Cluster, that was used to deploy the Workload (Guest) cluster. As this controls cluster deployments and holds the necessary bootstrap and machine creation configuration.
Once logged in, locate the existing VsphereMachineTemplate for your chosen cluster. Each cluster will have two configurations (one for Control Plane nodes, one for Compute plane/worker nodes).
If you have deployed TKG into a public cloud, then you can use the following types instead, and continue to follow this article as the theory is the same regardless of where you have deployed to:
AWSMachineTemplate on Amazon EC2
AzureMachineTemplate on Azure
kubectl get VsphereMachineTemplate
You can attempt to directly alter this file, however, when trying to save the edited file, you will be presented with the following error message:
kubectl edit VsphereMachineTemplate tkg-wld-01-worker
error: vspheremachinetemplates.infrastructure.cluster.x-k8s.io "tkg-wld-01-worker" could not be patched: admission webhook "validation.vspheremachinetemplate.infrastructure.x-k8s.io" denied the request: spec: Forbidden: VSphereMachineTemplateSpec is immutable
In this blog post, I am going to cover the new support for Tanzu Kubernetes Grid Management clusters on both VMware Cloud on AWS (VMC) and Azure VMware Solution (AVS). This functionality also allows the provisioning of new Tanzu Kubernetes workload clusters (TKC) to the relevant platform, provisioned by the lifecycle management controls within Tanzu Mission Control.
Below are the other blog posts I’ve wrote covering Tanzu Mission Control.
Below are the relevant release notes for the features I’ll cover. In this blog post, I’ll just be showing screenshots for a VMC environment, however the same applies to AVS as well.
What's New May 26, 2021
New Features and Improvements
(New Feature update): Tanzu Mission Control now supports the ability to register Tanzu Kubernetes Grid (1.3 & later) management clusters running in vSphere on Azure VMware Solution.
What's New April 30, 2021
New Features and Improvements
(New Feature update): Tanzu Mission Control now supports the ability to register Tanzu Kubernetes Grid (1.2 & later) management clusters running in vSphere on VMware Cloud on AWS. For a list of supported environments, see Requirements for Registering a Tanzu Kubernetes Cluster with Tanzu Mission Control in VMware Tanzu Mission Control Concepts.
This first management cluster deployment is not supported by TMC, nor is it supported for a management cluster to deploy workload clusters across platforms. For example, a management cluster running in AWS does not have the capability to deploy workload clusters to VMC or AVS or Azure.