Tag Archives: TKG

Tanzu Nvidia Header

Deploying Nvidia GPU enabled Tanzu Kubernetes Clusters

In this blog post I’m going to detail how deploy and configure a Nvidia GPU enabled Tanzu Kubernetes Grid cluster in AWS. The method will be similar for Azure, for vSphere there are a number of additional steps to prepare the system. I’m going to essentially follow the official documentation, then run some of the Nvidia tests. Like always, it’s good to get a visual reference and such for these kinds of deployments.

Pre-Reqs
  • Nvidia today only support Ubuntu deployed images in relation to a TKG deployment
  • For this blog I’ve already deployed my TKG Management cluster in AWS
Deploy a GPU enabled workload cluster

It’s simple, just deploy a workload cluster that for the compute plane nodes (workers) that uses a GPU enabled instance.

You can create a new cluster YAML file from scratch, or clone one of your existing located in:

~/.config/tanzu/tkg/clusterconfigs

Below are the four main values you will need to change. As mentioned above, you need a GPU enabled instance, and for the OS to be Ubuntu. The OS version will default if not set to 20.04.

CONTROL_PLANE_MACHINE_TYPE: t3.large
NODE_MACHINE_TYPE: g4dn.xlarge
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04

The rest of the file you configure as you would for any workload cluster deployment. Continue reading Deploying Nvidia GPU enabled Tanzu Kubernetes Clusters

Tanzu Blog Logo Header

Tanzu Kubernetes Grid – How to edit Node resources and Scale a Cluster Vertically With kubectl

In this blog post I am going to walk you through how to edit the Machine Resource configurations for nodes deployed by Tanzu Kubernetes Grid.

Example Issue – Disk Pressure

In my environment, I found I needed to alter my node resources, as several Pods were getting the evicted status in my cluster.

By running a describe on the pod, I could see the failure message was due to the node condition DiskPressure.

  • If you need to clean up a high number of pods across namespaces in your environment, see this blog post.
kubectl describe pod {name}

TKG - kubectl describe pod - failed - evicted - pod the node had condition disk pressure

I then looked at the node that the pod was scheduled too. (You can see this in the above screenshot, 4th line “node”).

Below we can see that on the node, Kubelet has tainted the node to stop further pods from being scheduled to this node.

In the events we see the message “Attempting to reclaim ephemeral-storage”

TKG - kubectl describle node - disk pressure

Configuring resources for Tanzu Kubernetes Grid nodes

First you will need to log into your Tanzu Kubernetes Grid Management Cluster, that was used to deploy the Workload (Guest) cluster. As this controls cluster deployments and holds the necessary bootstrap and machine creation configuration.

Once logged in, locate the existing VsphereMachineTemplate for your chosen cluster. Each cluster will have two configurations (one for Control Plane nodes, one for Compute plane/worker nodes).

If you have deployed TKG into a public cloud, then you can use the following types instead, and continue to follow this article as the theory is the same regardless of where you have deployed to:

  • AWSMachineTemplate on Amazon EC2
  • AzureMachineTemplate on Azure
kubectl get VsphereMachineTemplate

TKG - kubectl get VsphereMachineTemplate

You can attempt to directly alter this file, however, when trying to save the edited file, you will be presented with the following error message:

kubectl edit VsphereMachineTemplate tkg-wld-01-worker

error: vspheremachinetemplates.infrastructure.cluster.x-k8s.io "tkg-wld-01-worker" could not be patched: admission webhook "validation.vspheremachinetemplate.infrastructure.x-k8s.io" denied the request: spec: Forbidden: VSphereMachineTemplateSpec is immutable

TKG - kubectl edit VsphereMachineTemplate - Forbidden- VSphereMachineTemplateSpec is immutable

Instead, you must output the configuration to a local file and edit it. Also, you will need to remove the following fields if you are using my below method. Continue reading Tanzu Kubernetes Grid – How to edit Node resources and Scale a Cluster Vertically With kubectl

Tanzu Mission Control Header

Tanzu Mission Control – TKG Management support and provisioning new clusters

In this blog post, I am going to cover the new support for Tanzu Kubernetes Grid Management clusters on both VMware Cloud on AWS (VMC) and Azure VMware Solution (AVS). This functionality also allows the provisioning of new Tanzu Kubernetes workload clusters (TKC) to the relevant platform, provisioned by the lifecycle management controls within Tanzu Mission Control.

Below are the other blog posts I’ve wrote covering Tanzu Mission Control.

Tanzu Mission Control 
- Getting Started Tanzu Mission Control 
- Cluster Inspections 
- Workspaces and Policies  
- Data Protection 
- Deploying TKG clusters to AWS 
- Upgrading a provisioned cluster 
- Delete a provisioned cluster 
- TKG Management support and provisioning new clusters
- TMC REST API - Postman Collection
Release Notes

Below are the relevant release notes for the features I’ll cover. In this blog post, I’ll just be showing screenshots for a VMC environment, however the same applies to AVS as well.

What's New May 26, 2021

New Features and Improvements

    (New Feature update): Tanzu Mission Control now supports the ability to register Tanzu Kubernetes Grid (1.3 & later) management clusters running in vSphere on Azure VMware Solution.

What's New April 30, 2021

New Features and Improvements

    (New Feature update): Tanzu Mission Control now supports the ability to register Tanzu Kubernetes Grid (1.2 & later) management clusters running in vSphere on VMware Cloud on AWS. For a list of supported environments, see Requirements for Registering a Tanzu Kubernetes Cluster with Tanzu Mission Control in VMware Tanzu Mission Control Concepts.
Prerequisites

This first management cluster deployment is not supported by TMC, nor is it supported for a management cluster to deploy workload clusters across platforms. For example, a management cluster running in AWS does not have the capability to deploy workload clusters to VMC or AVS or Azure.

The following requirements are from the product documentation.

  • The management cluster must be deployed as a production cluster with multiple control plane nodes
    • However, in my demo lab I was able to successfully run this using a development deployment.
  • Tanzu Kubernetes Grid workload clusters need at least 4 CPUs and 8 GB of memory
    • Again, I deployed a small instance type (2 vCPU, 4GB RAM) and this didn’t seem to be an issue.
  • Tanzu Kubernetes Grid management clusters (version 1.3 or later) running in vSphere on Azure VMware Solution (AVS).
  • Tanzu Kubernetes Grid management clusters (version 1.2 or later) running in vSphere, including vSphere on VMware Cloud on AWS (version 1.12 or 1.14).
  • Do not attempt to register any other kind of management cluster with Tanzu Mission Control.
  • Tanzu Mission Control does not support the registration of Tanzu Kubernetes Grid management clusters prior to version 1.2.
Registering our Tanzu Kubernetes Grid Management Cluster
  • Go to Administration> Management Clusters > Register Management Cluster > Tanzu Kubernetes Grid

Tanzu Mission Control - Administration - Register Management Cluster - Tanzu Kubernetes Grid Continue reading Tanzu Mission Control – TKG Management support and provisioning new clusters

VMware Tanzu Header

Deploying Tanzu Kubernetes Grid to AWS fails with ‘InstanceProvisionFailed’

The issue

When deploying Tanzu Kubernetes Grid to AWS, the deployment was failing with the following output:

unable to set up management cluster, : unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'InstanceProvisionFailed @ Machine/tkg-aws-mgmt-control-plane-dqb4v', message:'1 of 2 completed'
The Cause

When we reviewed the CAPA logs (Cluster API AWS provider) we found the following errors logged: Continue reading Deploying Tanzu Kubernetes Grid to AWS fails with ‘InstanceProvisionFailed’

VMware Tanzu Header

Understanding the VMware Tanzu Kubernetes Terminology

It’s not uncommon for me to see the question asking for an explanation of VMware Tanzu Kubernetes terminology and differences between similar named products. As per the below tweet. This is my blog post to address the Tanzu Kubernetes terminology and use.

Twitter thread asking about TKGm and TKGs

First, we’ll break down the high level names and products. Then move into Tanzu Kubernetes products.

What is VMware Tanzu?

VMware Tanzu is a brand name covering VMware’s modern applications suite of products, just like vRealize is the suite name for VMware’s cloud management products.

What products are covered by the VMware Tanzu brand?

Continue reading Understanding the VMware Tanzu Kubernetes Terminology