vRealize Operations - Monitoring Kubernetes with Prometheus & Telegraf

Delivers intelligent operations management with application-to-storage visibility across physical, virtual, and cloud infrastructures. Using policy-based automation, operations teams automate key processes and improve the IT efficiency.

Prometheus

Is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

There are several libraries and servers which help in exporting existing metrics from third-party systems as Prometheus metrics. This is useful for cases where it is not feasible to instrument a given system with Prometheus metrics directly (for example, HAProxy or Linux system stats).

Telegraf is a plugin-driven server agent written by the folks over at InfluxData for collecting & reporting metrics. By using the Telegraf exporter, the following Kubernetes metrics are supported:

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/kubernetes#metrics

Why do it this way with three products?

You can actually achieve this with two products (vROPs and cAdvisor for example). Using vRealize Operations and a metric exporter that the data can be grabbed from in the Kubernetes cluster. By default, Kubernetes offers little in the way of metrics data until you install an appropriate package to do so.

Many customers have now decided upon using Prometheus for their metrics needs in their Modern Applications world due to the flexibility it offers.

Therefore, this integration provides a way for vRealize Operations to collect the data through an existing Prometheus deploy and enrich the data further by providing a context-aware relationship view between your virtualisation platform and the Kubernetes platform which runs on top of it.

vRealize Operations Management Pack for Kubernetes supports a number of Prometheus exporters in which to provide the relevant data. In this blog post we will focus on Telegraf.

You can view sample deployments here for all the supported types. This blog will show you an end-to-end setup and deployment.

Prerequisites

Administrative access to a vRealize Operations environment
- Install the “vRealize Operations Management Pack for Kubernetes”
  - Official Documentation
  - Marketplace Download Page (sign in required for free download)
Access to a Kubernetes cluster that you want to monitor
Install Helm if you have not already got it setup on the machine which has access to your Kubernetes cluster
Clone this GitHub repo to your machine to make life easier

git clone https://github.com/saintdle/vrops-prometheus-telegraf.git

Information Gathering

Note down the following information:

Cluster API Server information

kubectl cluster-info

Access details for the Kubernetes cluster
- Basic Authentication – Uses HTTP basic authentication to authenticate API requests through authentication plugins.
- Client Certification Authentication – Uses client certificates to authenticate API requests through authentication plugins.
- Token Authentication – Uses bearer tokens to authenticate API requests through authentication plugin

In this example I will be using “Client Certification Authentication” using my current authenticated user by running:

kubectl config view --minify --raw

Get your node names and IP addresses

kubectl get nodes -o wide

Install the Telegraf Kubernetes Plugin

Copy the correct Linux or Windows (depending on your container node OS) Telegraf plugin from the vRealize Management pack documentation page, to your machine:

Telegraf Kubernetes Plugin Setup for Windows And Linux
Backup copy here for the linux version

Alternatively, you should have cloned my repo, and it will be there too.

The reason why I ask you to copy this down locally is because of the use of variables in the YAML file, if you paste these into the interpreter to create the Kubernetes config using something like:
    
    kubectl create -f << EOF -
    something
    EOF

The linux interpreter replaces these values with input from the environment, if there is no matching input, the value is returned as Null, so $NODE_IP for example gets wiped out.

Apply the configuration:

kubectl create -f Telegraf\ Kubernetes\ Plugin\ for\ Linux.yaml

You can then monitor the pods with the command, as this installs a DaemonSet, a pod will run on each node in the cluster:

kubectl get pods -n kube-system -l app=vrops-telegraf-k8s

If you need to troubleshoot the Telegraf Kubernetes plugin you can run the following command:

kubectl logs {pod_name} -n kube-system
# Example
kubectl logs vrops-telegraf-k8s-4wrvt -n kube-system

Install Prometheus Server and configure the exporter

Now we are going to install Prometheus into our Kubernetes cluster, it is also supported to deploy this on a virtual machine outside of your Kubernetes cluster instead, in a larger environment this might be preferable.

I have provided a cut down helm values file in my GitHub Repo which configures the following:

Install only “Prometheus Server”
- Configure service using a Load Balancer
Setup config map that creates “prometheus.yaml” file with the correct scraper configuration

You will need to edit this file to include your node details we collected earlier; this allows Prometheus to scrape the data from the Telegraf exporter.

Official documentation – Add Telegraf Exporter

Alternatively, you can update the Config Map after Prometheus is deployed using the following command:

kubectl edit configmap prometheus-server -n {namespace}

The below way allows you to setup this configuration as part of the Helm install command, I personally think this makes life easier.

# Under the scrape_configs block
    scrape_configs:

# if you want to use the node-exporter default plugin, you need one of these blocks for each node in your cluster
    - job_name: 'node-exporter'
      static_configs:
      - targets: ['node_ip:9100' ]
        labels:
          nodename: 'nodename'

# Job to scrape the telegraf exporter, you can add multiple values to the targets block, port 31196 for Linux. port 31197 for Windows
    - job_name: 'telegraf-exporter'
      static_configs:
      - targets: ['node_ip_01:31196', 'node_ip_02:31196', 'node_ip_03:31196' ]

Here is my example config from line 509 in my example values file, based on the information collected at the start of the blog post:

  prometheus.yml:
    scrape_configs:
    - job_name: 'node-exporter'
      static_configs:
      - targets: ['192.168.200.176:9100' ]
        labels:
          nodename: 'tkg-wld-01-control-plane-2f7bt'
      - targets: ['192.168.200.180:9100' ]
        labels:
          nodename: 'tkg-wld-01-md-0-696f98994f-4bjv6'
      - targets: ['192.168.200.184:9100' ]
        labels:
          nodename: 'tkg-wld-01-md-0-696f98994f-h7ks5'
      - targets: ['192.168.200.185:9100' ]
        labels:
          nodename: 'tkg-wld-01-md-0-696f98994f-zhscz'
      - targets: ['192.168.200.187:9100' ]
        labels:
          nodename: 'tkg-wld-01-md-0-696f98994f-v56p9'
    - job_name: 'telegraf-exporter'
      static_configs:
      - targets: ['192.168.200.177:31196', '192.168.200.174:31196', '192.168.200.186:31196' ]

Now to install Prometheus using the provided values file.

# Add the Prometheus Community Repo to helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# Create a namespace to install Prometheus into

kubectl create namespace monitoring

# Install Prometheus, this example uses my cut down values file

helm install prometheus prometheus-community/prometheus -n monitoring --values prometheus_values.yaml

You can watch the pods come up with the command:

kubectl get pods -n monitoring

You can get the external address with the below commands which are outputted by the Helm Install.

You will need this address for vRealize Operations Kubernetes adapter configuration.

Get the Prometheus server URL by running these commands in the same shell:

NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc --namespace monitoring -w prometheus-server'

export SERVICE_IP=$(kubectl get svc --namespace monitoring prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80

To validate everything has come up, go to the Prometheus Address, in my configuration there is no authentication.

In my below screenshot, you can see I have 3 of 3 targets reporting into Prometheus, this is because I forgot to add in my other two node IP addresses in the config map scraper details.

I can correct that as mentioned above by running the below command, I also fixed my node-exporter, by removing it, as it needs additional configuration in the environment for a control-plane node. In this article we are only concerned with the Telegraf Exporter:

kubectl edit configmap prometheus-server -n monitoring

Configure vRealize Operations to monitor Kubernetes via Prometheus

Log into your vRealize Operations environment.

Upload and install the management pack

Now let’s configure the adapter.

Click on Integrations (if using the older vROPs you’ll need to click on Administration first)
Click Add Account

Select Kubernetes

Set the name for the adapter
Set Clusters API address you collected at the information gathering stage
Set the Collector Service to Prometheus
Click the “+” symbol to create a new credential
- Select the Credential Type
- Set a credential name
- Set the authentication details for the selected credential type
- Set the Prometheus Server address using either HTTP or HTTPS and the associated port
- Provide any necessary authentication with Prometheus
Open the Advance settings, set the vCenter server to map the Kubernetes cluster objects to your virtual infrastructure.
Validate your connection, you will be asked to accept the SSL from the Kubernetes API.
Click Add adapter

You may see the adapter sit in a warning status for a while whilst the initial data collection is running. After a few minutes my environment turned to OK.

Viewing Kubernetes Telegraf Metrics in vRealize Operations

I won’t go into massive detail about all of the views and dashboards, I covered most of this in an earlier blog post. However, the power of the Telegraf exporter is getting the ability to have Container metrics, from CPU/Memory/Network and Storage resource usage.

Below you can see on the left-hand navigation pane the mapping of all the Kubernetes objects through to the virtualisation components where they run (mapped to a virtual machine, host etc etc).

In the main metrics screen, I can see Prometheus as a heading, then Telegraf Exporter. I’ve charted out some metrics from the “ArgoCD Server” container for CPU and Memory usage.

And looking at the same container for networking metrics as well.

Finally, we can see storage metrics from the point of view from the container as well. As this is my lab environment, not much happens. However, I changed the context to look at my Kasten namespace, and viewing the persistent volume attached to the catalog services, as I know this would show some data usage for the screenshot below.

Summary and wrap-up

I wrote this blog to add more clarity to the official documentation when using Prometheus as an endpoint to pull Kubernetes metrics into vRealize Operations. Of course, you can also choose your exporter of choice, and potentially use other exporters to pull extra information into the system about your Kubernetes clusters and applications.

The key takeaway for me here, is the ability to provide the infrastructure-up view from your virtualisation platform into your Kubernetes cluster. Should you have to work with your Apps team, troubleshooting hopefully has become a little simpler as you now have visibility into their world and systems.

Regards

Follow @Saintdle

Dean Lewis

vEducate.co.uk

Fixing issues and blogging

vRealize Operations – Monitoring Kubernetes with Prometheus and Telegraf

Overview