Category Archives: Kubernetes

Tanzu Blog Logo Header

Tanzu Kubernetes Grid – How to edit Node resources and Scale a Cluster Vertically With kubectl

In this blog post I am going to walk you through how to edit the Machine Resource configurations for nodes deployed by Tanzu Kubernetes Grid.

Example Issue – Disk Pressure

In my environment, I found I needed to alter my node resources, as several Pods were getting the evicted status in my cluster.

By running a describe on the pod, I could see the failure message was due to the node condition DiskPressure.

  • If you need to clean up a high number of pods across namespaces in your environment, see this blog post.
kubectl describe pod {name}

TKG - kubectl describe pod - failed - evicted - pod the node had condition disk pressure

I then looked at the node that the pod was scheduled too. (You can see this in the above screenshot, 4th line “node”).

Below we can see that on the node, Kubelet has tainted the node to stop further pods from being scheduled to this node.

In the events we see the message “Attempting to reclaim ephemeral-storage”

TKG - kubectl describle node - disk pressure

Configuring resources for Tanzu Kubernetes Grid nodes

First you will need to log into your Tanzu Kubernetes Grid Management Cluster, that was used to deploy the Workload (Guest) cluster. As this controls cluster deployments and holds the necessary bootstrap and machine creation configuration.

Once logged in, locate the existing VsphereMachineTemplate for your chosen cluster. Each cluster will have two configurations (one for Control Plane nodes, one for Compute plane/worker nodes).

If you have deployed TKG into a public cloud, then you can use the following types instead, and continue to follow this article as the theory is the same regardless of where you have deployed to:

  • AWSMachineTemplate on Amazon EC2
  • AzureMachineTemplate on Azure
kubectl get VsphereMachineTemplate

TKG - kubectl get VsphereMachineTemplate

You can attempt to directly alter this file, however, when trying to save the edited file, you will be presented with the following error message:

kubectl edit VsphereMachineTemplate tkg-wld-01-worker

error: vspheremachinetemplates.infrastructure.cluster.x-k8s.io "tkg-wld-01-worker" could not be patched: admission webhook "validation.vspheremachinetemplate.infrastructure.x-k8s.io" denied the request: spec: Forbidden: VSphereMachineTemplateSpec is immutable

TKG - kubectl edit VsphereMachineTemplate - Forbidden- VSphereMachineTemplateSpec is immutable

Instead, you must output the configuration to a local file and edit it. Also, you will need to remove the following fields if you are using my below method. Continue reading Tanzu Kubernetes Grid – How to edit Node resources and Scale a Cluster Vertically With kubectl

Kubernetes

Quick Tip – Kubernetes – Delete all evicted pods across all namespaces

I’m currently troubleshooting an issue with my Kubernetes clusters where pods keep getting evicted, and this is happening across namespaces as well.

The issue now that I am faced with, is being able to keep ontop of the issues. When I run:

kubectl get pods -A | grep Evicted

I’m presented with 100’s of returned results.

kubectl get pods -A grep Evicted

So to quickly clean this up, I can run the following command: Continue reading Quick Tip – Kubernetes – Delete all evicted pods across all namespaces

Kasten K10 Header

Kasten K10 – Air gap installation using Harbor Image Registry

In this blog post, I will cover the steps for an air-gap installation for Kasten K10. For situations where your Kubernetes cluster doesn’t have available internet access to pull down the container images directly from their online locations.

Pre-requisites
  • Image Registry that is accessible by your Kubernetes cluster
  • Client that has access to download the container images and then to the Image Registry
    • In this example, I am using my local machine which has docker installed.
  • Helm downloaded
    • Run the following to get the helm files locally for the install.
helm repo update && \
    helm fetch kasten/k10 --version=<k10-version>

Example for Kasten K10 4.5.0

helm repo update && \ 
    helm fetch kasten/k10 --version=4.5.0

This will download a file, for example "k10-4.5.0.tgz"
Log into your Image Registry

First you need to ensure that your docker client (or similar) has authenticated to your Image Registry which your air-gap Kubernetes cluster can access.

When using Harbor and Docker, I typically use this method with a robot account for programmatic access.

However, when running the Kasten tooling which we’ll discuss next, I kept hitting an error. Continue reading Kasten K10 – Air gap installation using Harbor Image Registry

Kubernetes

Kubernetes Troubleshooting – Kubelet Unable to attach or mount volumes – timed out waiting for the condition

The Issue

When I updated my Kasten application in my Kubernetes cluster, I found that one of the pods was stuck in “init” status.

dean@dean [ ~ ] (⎈ |tkg-wld-01-admin@tkg-wld-01:default) # k get pods -n kasten-io -w
NAME READY STATUS RESTARTS AGE
aggregatedapis-svc-78564d4697-wl9wg 1/1 Running 0 3m9s
auth-svc-7977b9684b-zph27 1/1 Running 0 3m11s
catalog-svc-7ff7779b75-kmvsr 0/2 Init:0/2 0 2m43s

kubectl get pods - status init

Running a describe on that pod pointed to the fact the volume could not be attached.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m58s default-scheduler Successfully assigned kasten-io/catalog-svc-7ff7779b75-kmvsr to tkg-wld-01-md-0-54598b8d99-rpqjf
Warning FailedMount 55s kubelet Unable to attach or mount volumes: unmounted volumes=[catalog-persistent-storage], unattached volumes=[k10-k10-token-lbqpw catalog-persistent-storage]: timed out waiting for the condition
kubelet Unable to attach or mount volumes- unmounted volumes=[catalog-persistent-storage], unattached volumes=[k10-k10-token-lbqpw catalog-persistent-storage]- timed out waiting for the condition
The Cause

Some where along the line I found some stale volumeattachments linked to Kubernetes node that no longer exist in my cluster. This looks to be causing some confusion in the cluster who should be attaching the volume

The image below shows:

  • Find the Persistent Volume name linked to the associated claim for the failure in the pod events
  • Map this to the available VolumeAttachments
  • Reference VolumeAttachments for each node to available nodes in the cluster
    • I’ve highlighted the missing node in the red box

kubectl get pv - get volumeattachment - get nodes

The Fix

The fix is to remove the stale VolumeAttachment.

kubectl delete volumeattachment [volumeattachment_name]

kubectl delete volumeattachment

After this your pod should eventually pick up and retry, or you could remove the pod and let Kubernetes replace it for you (so long as it’s part of a deployment or other configuration managing your application).

Regards

Dean Lewis

VMware Tanzu Header

vSphere with Tanzu – Creating cluster fails with “storage class is not valid”

The Issue

When you have attached a vSphere Storage Policy to your vSphere Namespace, and tried to create a cluster using the Storage Policy Name, you find it will fail with an error such as:

Error from server (storage class is not valid for control plane VM: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl', 

storage class is not valid for worker VMs: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl', 

storage class Tanzu Storage Policy under spec.settings.storage.defaultClass is not valid: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl'): error when creating "cluster.yaml": 

admission webhook "default.validating.tanzukubernetescluster.run.tanzu.vmware.com" denied the request: storage class is not valid for control plane VM: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl', 

storage class is not valid for worker VMs: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl', 

storage class Tanzu Storage Policy under spec.settings.storage.defaultClass is not valid: StorageClass 'Tanzu Storage Policy' is not assigned for namespace 'deanl'

When you look at the vSphere namespace, the Storage Policy is attached.

vSphere Namespace Storage Policy

And example of the erroneous Tanzu Cluster definition YAML:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: deanl-cluster
  namespace: deanl
spec:
  distribution:
    version: v1.18.5
  topology:
    controlPlane:
      class: best-effort-small
      count: 1
      storageClass: "Tanzu Storage Policy"
    workers:
      class: best-effort-small
      count: 1
      storageClass: "Tanzu Storage Policy"
  settings:
    network:
      cni:
        name: calico
    storage:
      defaultClass: "Tanzu Storage Policy"
The Cause

Continue reading vSphere with Tanzu – Creating cluster fails with “storage class is not valid”