Category Archives: Kubernetes

Kubernetes

How to fix in Kubernetes – Deleting a PVC stuck in status “Terminating”

The Issue

Whilst working on a Kubernetes demo for a customer, I was cleaning up my environment and deleting persistent volume claims (PVC) that were no longer need.

I noticed that one PVC was stuck in “terminating” status for quite a while.

Note: I am using the OC commands in place of kubectl due to this being a Openshift environment

The Cause

I had a quick google and found I needed to verify if the PVC is still attached to a node in the cluster.

kubectl get volumeattachment

I could see it was, and the reason behind this was the configuration for the PVC was not fully updated during the delete process.

The Fix

I found the fix on this github issue log .

You need to patch the PVC to set the “finalizers” setting to null, this allows the final unmount from the node, and the PVC can be deleted.

kubectl patch pvc {PVC_NAME} -p '{"metadata":{"finalizers":null}}'

Regards

OpenShift

Using the vSphere CSI Driver with OpenShift 4.x and VSAN File Services

You may have seen my blog post “How to Install and configure vSphere CSI Driver on OpenShift 4.x“.

Here I updated the vSphere CSI driver to work the additional security constraints that are baked into OpenShift 4.x.

Since then, once of the things that has been on my list to test is file volumes backed by vSAN File shares. This feature is available in vSphere 7.0.

Well I’m glad to report it does in fact work, by using my CSI driver (see above blog or my github), you can simply deploy consume VSAN File services, as per the documentation here. 

I’ve updated my examples in my github repository to get this working.

OK just tell me what to do…

First and foremost, you need to add additional configuration to the csi conf file (csi-vsphere-for-ocp.conf).

If you do not, the defaults will be assumed which is full read-write access from any IP to the file shares created.

[Global]

# run the following on your OCP cluster to get the ID 
# oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'
cluster-id = c6d41ba1-3b67-4ae4-ab1e-3cd2e730e1f2

[NetPermissions "A"]
ips = "*"
permissions = "READ_WRITE"
rootsquash = false

[VirtualCenter "10.198.17.253"]
insecure-flag = "true"
user = "administrator@vsphere.local"
password = "Admin!23"
port = "443"
datacenters = "vSAN-DC"
targetvSANFileShareDatastoreURLs = "ds:///vmfs/volumes/vsan:52c229eaf3afcda6-7c4116754aded2de/"

Next, create a storage class which is configured to consume VSAN File services.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: file-services-sc
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: csi.vsphere.vmware.com
parameters:
storagepolicyname: "vSAN Default Storage Policy" # Optional Parameter
csi.storage.k8s.io/fstype: "nfs4" # Optional Parameter

Then create a PVC to prove it works. Continue reading Using the vSphere CSI Driver with OpenShift 4.x and VSAN File Services

OpenShift

How to Install and configure vSphere CSI Driver on OpenShift 4.x

Introduction

In this post I am going to install the vSphere CSI Driver version 2.0 with OpenShift 4.x, in my demo environment I’m connecting to a VMware Cloud on AWS SDDC and vCenter, however the steps are the same for an on-prem deployment.

I have updated the available configuration files available from Red Hat for installing the CSI Driver in OpenShift, to make them compatible with the latest CSI Driver. You can find these in my GitHub repo;

- Pre-Reqs
- - vCenter Server Role
- - Download the deployment files
- - Create the vSphere CSI secret in OpenShift
- - Create Roles, ServiceAccount and ClusterRoleBinding for vSphere CSI Driver
- Installation
- - Install vSphere CSI driver
- - Verify Deployment
- Create a persistent volume claim
- Using Labels
- Troubleshooting

In your environment, cluster VMs will need “disk.enableUUID” and VM hardware version 15 or higher.

Pre-Reqs
vCenter Server Role

In my environment I will use the default administrator account, however in production environments I recommend you follow a strict RBAC procedure and configure the necessary roles and use a dedicated account for the CSI driver to connect to your vCenter.

To make life easier I have created a PowerCLI script to create the necessary roles in vCenter based on the vSphere CSI documentation;

Download the deployment files

Run the following;

git clone https://github.com/saintdle/vSphere-CSI-Driver-2.0-OpenShift-4.git

Create the vSphere CSI Secret in OpenShift

Edit the file “csi-vsphere-for-OCP.conf” with your vCenter infrastructure details;

[Global]
 
# run the following on your OCP cluster to get the ID
# oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'
cluster-id = "OCP_CLUSTER_ID"

[VirtualCenter "VC_FQDN"]
insecure-flag = "true"
user = "USER"
password = "PASSWORD"
port = "443"
datacenters = "VC_DATACENTER"

Create the secret;

oc create secret generic vsphere-config-secret --from-file=csi-vsphere-for-OCP.conf --namespace=kube-system

oc get secret vsphere-config-secret --namespace=kube-system

This configuration is for block volumes, it is also supported to configure access to VSAN File volumes, and you can see an example of the configuration here;

Remove your “csi-vsphere-for-OCP.conf” once the secret is created, as it contains your password in clear text for vCenter.

Create Roles, ServiceAccount and ClusterRoleBinding for vSphere CSI Driver

Continue reading How to Install and configure vSphere CSI Driver on OpenShift 4.x

OpenShift

How to deploy OpenShift 4.3 on VMware vSphere with Static IP addresses using Terraform

Install OpenShift 4.x on vSphere 6.x/7.x

The following procedure is intended to create VM’s from an OVA template booting with static IP’s when the DHCP server can not reserve the IP addresses.

The Problem

OCP requires that all DNS configurations be in place. VMware requires that the DHCP assign the correct IPs to the VM. Since many real installations require the coordination with different teams in an organization, many times we don’t have control of DNS, DHCP or Load balancer configurations.

The CoreOS documentation explain how to create configurations using ignition files. I created a python script to put the network configuration using the ignition files created by the openshift-install program.

Reference Architecture

For this guide, we are going to deploy 3 master nodes (control-plane) and 2 worker nodes (compute This guide uses RHEL CoreOS 4.3 as the virtual machine image, deploying Red Hat OCP 4.3, as per the support of N-1 from Red Hat.

We will use a centralised Linux server (Ubuntu) that will perform the following functions;

  • Load Balancer – HAProxy
  • Web Server – Apache2
  • Terraform automation host – version 0.11.14
    • The deployment will be semi-automated using Terraform, so that we can easily build configuration files used by the CoreOS VM’s that have Static IP settings.
    • Using a later version of Terraform will cause failures.
  • Client Tools for OpenShift deployment
    • OC
    • Kubectl
    • Openshift-install

DNS will be provided by a Windows Server.

The installation will use a Bootstrap server to bring the cluster online, which will be removed at the end of the build process.

Deployment Steps

In this guide we will deploy our environment in the following order;

  • Configure DNS
  • Import Red Hat Core OS image into vCenter
  • Deploy Ubuntu Host
    • Configure Apache
    • Configure HAProxy
    • Install Client-Tools
    • Install Terraform
  • Build OpenShift Cluster configuration
  • Configuring the Terraform deployment
  • Running the Terraform deployment
DNS

Openshift uses a “clusterName.BaseDomain” format.

For example; I want to call my Openshift cluster Demo. And my DNS Domain is Simon.local, then my full format used by Openshift is “demo.simon.local”

Below is a table plan of the IP addresses you will use to build the environment.

The last three addresses are cluster level resources that are available on each control-plane node, accessible via the load balancer.

To configure the DNS records in Windows, you can use the Script and CSV file here

In the below screenshot, the script has created the “demo” domain folder and entered my records. It is important that you have PTR records setup for everything apart from the “etcd-X” records.

Import Red Hat CoreOS Image into vCenter

Continue reading How to deploy OpenShift 4.3 on VMware vSphere with Static IP addresses using Terraform

Kubernetes

Kubernetes command line: tips and tricks

In this blog post, I have collected together a number of tips, tricks and snippets I’ve learned along the away whilst learning Kubernetes.

- Configure tab completion
- Selecting all namespaces in commands
- Restarting nodes
- Setting default storage class
- Resource usage
- Delete pods that are stuck terminating
- Using the watch command
- Troubleshooting
- - Run an interactive pod for debugging issues
- - - Alpine & BusyBox
- - Check etcd is running on master nodes
- - Get deployed pod image
- - Get Kubelet Service Logs
- - Get events from all namespaces, sorted by creation time
- Other Resources
- - Visual guide on troubleshooting Kubernetes deployments
- - Tool: Stern for tailing multiple Kubernetes objects logs
- - Useful Aliases to create for managing Kubernetes

I would also highly recommend the awesome Kubectl Cheat Sheet to be one of your go to references.

Configure Tab completion
source <(kubectl completion bash)
Selecting all name spaces in commands

rather than using “–all-namespaces” you can use “-A”

kubectl get pods --all-namespaces

kubectl get pods -A
Restarting Nodes

SSH to problematic node and run

/etc/init.d/kubelet restart

Source

Setting default storage class

Remove default storage class setting

kubectl patch storageclass {SC_NAME} -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

Configure storage class as default

kubectl patch storageclass {SC_NAME} -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Source

Resource Usage

Requires metrics-server to be installed and running (github)

Pods;

#Check what pods are using the most memory in the cluster:
kubectl top pod --all-namespaces  | sort -rnk4 | head -40
 
#Check what pods are using the most CPU in the cluster:
kubectl top pod --all-namespaces  | sort -rnk3 | head -80

Nodes;

#Check which nodes are using the most memory in the cluster:
kubectl top nodes --all-namespaces  | sort -rnk4 | head -40
 
#Check which nodes are using the most CPU in the cluster:
kubectl top nodes --all-namespaces  | sort -rnk3 | head -80

Verify Kubelet is exposing Node metrics;

kubectl get --raw /api/v1/nodes/{Node_Name}/proxy/stats/summary

To get kube-metrics working I had to add the following to the deployment. (Highlighted in bold).

kubectl edit deployment metrics-server -n kube-system
#############
name: metrics-server
spec:
containers:
- args:
 - --kubelet-preferred-address-types=InternalIP
 - --kubelet-insecure-tls

Delete pods that are stuck terminating
kubectl get pods --all-namespaces | grep Terminating | while read line; do pod_name=$(echo $line | awk '{print $2}') && name_space=$(echo $line | awk '{print $1}' ); kubectl delete pods $pod_name -n $name_space --grace-period=0 --force ; done
Using the Watch command

Really simple one, but when deploying things, sometimes you don’t the feedback you need from the system. However using the Linux watch command infront of your Kubernetes commands, you can;

watch -n 2 kubectl get pods -n {namespace}

In the above example, this command will refresh your page every 2 seconds and list out the available pods and status.

Troubleshooting:
Run an interactive pod for debugging

This will create a pod of one of the below images, which will be removed when you exit out of the session.

Apline;

kubectl run -i --rm -t alpine-$USER --image=alpine --restart=Never -- /bin/sh

Press enter

BusyBox

kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh

Press enter

Source

Check etcd is running on master nodes

Check etcd pods have been created by Kubelet

sudo crictl pods --name=etcd-member

or 

sudo crictl ps -A

Check etcd logs on master nodes

sudo crictl logs $(sudo crictl ps --pod=$(sudo crictl pods --name=etcd-member --quiet) --quiet)

Source

Get pod deployed image
Kubectl get pod {name} -n {namespace} -o "jsonpath={range .status.containerStatuses[*]}{.name}{'\t'}{.state}{'\t'}{.image}{'\n'}{end}"

Example: 

[email protected]# kubectl get pods nginx -o "jsonpath={range .status.containerStatuses[*]}{.name}{'\t'}{.state}{'\t'}{.image}{'\n'}{end}"

nginx map[running:map[startedAt:2020-06-10T15:44:40Z]] nginx:latest

Get Kubelet Service logs

SSH to your node and run the following

journalctl -f -u kubelet.service
Get events from all namespaces, sorted by creation time
kubectl get events -A  --sort-by='.metadata.creationTimestamp'
Other Resources

A visual guide on troubleshooting Kubernetes deployments

Tool: Stern allows you to tail multiple pods on Kubernetes and multiple containers within the pod. Each result is colour coded for quicker debugging.

This can be more useful than the Kubectl logs command, which you need to know your individual pods name.

Tail logs of all pods of the deployment/service
 CMD: stern -n {Namespace} {deployment}
 
Same as above but starting with logs in the last minute
 CMD: stern -n {Namespace} {deployment} -s 1m

Useful Alias, can be used without ZSH

Regards