Tag Archives: Kubernetes

vROPs - Kubernetes - Prometheus - Telegraf - Header

vRealize Operations – Monitoring Kubernetes with Prometheus and Telegraf

In this post, I will cover how to deploy Prometheus and the Telegraf exporter and configure so that the data can be collected by vRealize Operations.

Overview

Delivers intelligent operations management with application-to-storage visibility across physical, virtual, and cloud infrastructures. Using policy-based automation, operations teams automate key processes and improve the IT efficiency.

Is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

There are several libraries and servers which help in exporting existing metrics from third-party systems as Prometheus metrics. This is useful for cases where it is not feasible to instrument a given system with Prometheus metrics directly (for example, HAProxy or Linux system stats).

Telegraf is a plugin-driven server agent written by the folks over at InfluxData for collecting & reporting metrics. By using the Telegraf exporter, the following Kubernetes metrics are supported:

Why do it this way with three products?

You can actually achieve this with two products (vROPs and cAdvisor for example). Using vRealize Operations and a metric exporter that the data can be grabbed from in the Kubernetes cluster. By default, Kubernetes offers little in the way of metrics data until you install an appropriate package to do so.

Many customers have now decided upon using Prometheus for their metrics needs in their Modern Applications world due to the flexibility it offers.

Therefore, this integration provides a way for vRealize Operations to collect the data through an existing Prometheus deploy and enrich the data further by providing a context-aware relationship view between your virtualisation platform and the Kubernetes platform which runs on top of it.

vRealize Operations Management Pack for Kubernetes supports a number of Prometheus exporters in which to provide the relevant data. In this blog post we will focus on Telegraf.

You can view sample deployments here for all the supported types. This blog will show you an end-to-end setup and deployment.

Prerequisites
  • Administrative access to a vRealize Operations environment
  • Access to a Kubernetes cluster that you want to monitor
  • Install Helm if you have not already got it setup on the machine which has access to your Kubernetes cluster
  • Clone this GitHub repo to your machine to make life easier
git clone https://github.com/saintdle/vrops-prometheus-telegraf.git
vrops - git clone saintdle vrops-prometheus-telegraf.git
Information Gathering

Note down the following information:

  • Cluster API Server information
kubectl cluster-info

vROPs - kubectl cluster-info

  • Access details for the Kubernetes cluster
    • Basic Authentication – Uses HTTP basic authentication to authenticate API requests through authentication plugins.
    • Client Certification Authentication – Uses client certificates to authenticate API requests through authentication plugins.
    • Token Authentication – Uses bearer tokens to authenticate API requests through authentication plugin

In this example I will be using “Client Certification Authentication” using my current authenticated user by running:

kubectl config view --minify --raw

vROPs - kubectl config view --minify --raw

  • Get your node names and IP addresses
kubectl get nodes -o wide

vROPs - kubectl get nodes -o wide

Install the Telegraf Kubernetes Plugin

Continue reading vRealize Operations – Monitoring Kubernetes with Prometheus and Telegraf

vSphere and CSI Header

Upgrading the vSphere CSI Driver (Storage Container Plugin) from v2.1.0 to latest

In this post I’m just documenting the steps on how to upgrade the vSphere CSI Driver, especially if you must make a jump in versioning to the latest version.

Upgrade from pre-v2.3.0 CSI Driver version to v2.3.0

You need to figure out what version of the vSphere CSI Driver you are running.

For me it was easy as I could look up the Tanzu Kubernetes Grid release notes. Please refer to your deployment manifests in your cluster. If you are still unsure, contact VMware Support for assistance.

Then you need to find your manifests for your associated version. You can do this by viewing the releases by tag. 

Then remove the resources created by the associated manifests. Below are the commands to remove the version 2.1.0 installation of the CSI.

kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.1.0/manifests/latest/vsphere-7.0u1/vanilla/deploy/vsphere-csi-controller-deployment.yaml

kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.1.0/manifests/latest/vsphere-7.0u1/vanilla/deploy/vsphere-csi-node-ds.yaml

kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.1.0/manifests/latest/vsphere-7.0u1/vanilla/rbac/vsphere-csi-controller-rbac.yaml

vsphere-csi - delete manifests

Now we need to create the new namespace, “vmware-system-csi”, where all new and future vSphere CSI Driver components will run. Continue reading Upgrading the vSphere CSI Driver (Storage Container Plugin) from v2.1.0 to latest

Kubernetes

Quick Tip – Kubernetes – Delete all evicted pods across all namespaces

I’m currently troubleshooting an issue with my Kubernetes clusters where pods keep getting evicted, and this is happening across namespaces as well.

The issue now that I am faced with, is being able to keep ontop of the issues. When I run:

kubectl get pods -A | grep Evicted

I’m presented with 100’s of returned results.

kubectl get pods -A grep Evicted

So to quickly clean this up, I can run the following command: Continue reading Quick Tip – Kubernetes – Delete all evicted pods across all namespaces

Kubernetes

Kubernetes Troubleshooting – Kubelet Unable to attach or mount volumes – timed out waiting for the condition

The Issue

When I updated my Kasten application in my Kubernetes cluster, I found that one of the pods was stuck in “init” status.

dean@dean [ ~ ] (⎈ |tkg-wld-01-admin@tkg-wld-01:default) # k get pods -n kasten-io -w
NAME READY STATUS RESTARTS AGE
aggregatedapis-svc-78564d4697-wl9wg 1/1 Running 0 3m9s
auth-svc-7977b9684b-zph27 1/1 Running 0 3m11s
catalog-svc-7ff7779b75-kmvsr 0/2 Init:0/2 0 2m43s

kubectl get pods - status init

Running a describe on that pod pointed to the fact the volume could not be attached.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m58s default-scheduler Successfully assigned kasten-io/catalog-svc-7ff7779b75-kmvsr to tkg-wld-01-md-0-54598b8d99-rpqjf
Warning FailedMount 55s kubelet Unable to attach or mount volumes: unmounted volumes=[catalog-persistent-storage], unattached volumes=[k10-k10-token-lbqpw catalog-persistent-storage]: timed out waiting for the condition
kubelet Unable to attach or mount volumes- unmounted volumes=[catalog-persistent-storage], unattached volumes=[k10-k10-token-lbqpw catalog-persistent-storage]- timed out waiting for the condition
The Cause

Some where along the line I found some stale volumeattachments linked to Kubernetes node that no longer exist in my cluster. This looks to be causing some confusion in the cluster who should be attaching the volume

The image below shows:

  • Find the Persistent Volume name linked to the associated claim for the failure in the pod events
  • Map this to the available VolumeAttachments
  • Reference VolumeAttachments for each node to available nodes in the cluster
    • I’ve highlighted the missing node in the red box

kubectl get pv - get volumeattachment - get nodes

The Fix

The fix is to remove the stale VolumeAttachment.

kubectl delete volumeattachment [volumeattachment_name]

kubectl delete volumeattachment

After this your pod should eventually pick up and retry, or you could remove the pod and let Kubernetes replace it for you (so long as it’s part of a deployment or other configuration managing your application).

Regards

Dean Lewis

MongoDB + Kubernetes Header

MongoDB Container data loss issue – A Journey

Over the past month or so I noticed an issue with my Pac-Man Kubernetes application, which I use for demonstrations as a basic app front-end that writes to a database back end, running in Kubernetes.

  • When I restored my instances using Kasten, my Pac-Man high scores were missing.
  • This issue happened when I made some changes to my deployment files to configure authentication to the MongoDB using environment variables in my deployment file.

This blog post is a detail walk-through of the steps I took to troubleshoot the issue, and then rectify it!

Summary if you don’t want to read the post

If you are not looking to read through this blog post, here is the summary:

  • I changed MongoDB images, I needed to configure a new mount point location to match the MongoDB configuration
  • New MongoDB image is non-root, so had to use an Init container to configure the permissions on the PV first
Overview of the application

The application is made up of the following components:

  • Namespace
  • Deployment
    • MongoDB Pod
      • DB Authentication configured
      • Attached to a PVC
    • Pac-Man Pod
      • Nodejs web front end that connects back to the MongoDB Pod by looking for the Pod DNS address internally.
  • RBAC Configuration for Pod Security and Service Account
  • Secret which holds the data for the MongoDB Usernames and Passwords to be configured
  • Service
    • Type: LoadBalancer
      • Used to balance traffic to the Pac-Man Pods

Pac-Man Kubernetes Diagram

Confirming the behaviour

The behaviour I was seeing when my application was deployed:

  • Pac-Man web page – I could save a high score, and it would show in the high scores list
    • This showed the connectivity to the database was working, as the app would hang if it could not write to the database.
  • I would protect my application using Kasten. When I deleted the namespace, and restored everything, my application would be running, but there was no high scores to show.
  • This was apparent from deploying the branch version v0.5.0 and v0.5.1 from my GitHub.
  • Deploying the branch v0.2.0 would not product the same behaviour
    • This configuration did not have any database authentication setup, meaning MongoDB was open to the world if they could connect without a UN/Password.
Testing the Behaviour

Continue reading MongoDB Container data loss issue – A Journey