In years gone by, costing of your technology platforms was covered in a product called vRealize Business for Cloud. Since the move to the 8.x code based, this product was EOL’d.
The main functions where customers saw value, to provide costings for your datacenter and virtual machines, was wrapped up into vRealize Operations.
This blog post is going to deep dive into the costing capabilities within vRealize Operations across your on-premises datacenters, and what happens when you start to consume VMware on Hyperscaler solutions, such as VMware Cloud on AWS (VMC).
Configure the Global Currency Setting
The first action is setting the global currency for the vRealize Operations instance. There are two important things to note when undertaking this configuration:
This can only be set once
This setting cannot be changed once it is set
To configure:
Click on Administration
Click on the Global Settings Tile
Click on the Cost/Price heading
Click to “Set currency”
Select your currency from the list and click “Set Currency”.
You will get a dialog to say the configuration has taken place.
Now below you can see that this setting is in place and there is no button/clickable option to change it.
Configuring Cost Settings
Now that the global currency is configured, we can start configuring all the cost settings for our Datacenter platforms.
Delivers intelligent operations management with application-to-storage visibility across physical, virtual, and cloud infrastructures. Using policy-based automation, operations teams automate key processes and improve the IT efficiency.
Is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
There are several libraries and servers which help in exporting existing metrics from third-party systems as Prometheus metrics. This is useful for cases where it is not feasible to instrument a given system with Prometheus metrics directly (for example, HAProxy or Linux system stats).
Telegraf is a plugin-driven server agent written by the folks over at InfluxData for collecting & reporting metrics. By using the Telegraf exporter, the following Kubernetes metrics are supported:
You can actually achieve this with two products (vROPs and cAdvisor for example). Using vRealize Operations and a metric exporter that the data can be grabbed from in the Kubernetes cluster. By default, Kubernetes offers little in the way of metrics data until you install an appropriate package to do so.
Many customers have now decided upon using Prometheus for their metrics needs in their Modern Applications world due to the flexibility it offers.
Therefore, this integration provides a way for vRealize Operations to collect the data through an existing Prometheus deploy and enrich the data further by providing a context-aware relationship view between your virtualisation platform and the Kubernetes platform which runs on top of it.
vRealize Operations Management Pack for Kubernetes supports a number of Prometheus exporters in which to provide the relevant data. In this blog post we will focus on Telegraf.
You can view sample deployments here for all the supported types. This blog will show you an end-to-end setup and deployment.
Prerequisites
Administrative access to a vRealize Operations environment
Install the “vRealize Operations Management Pack for Kubernetes”
Whilst reading some of the older vRealize Operations documentation, I stumbled on something I didn’t think was possible.
The ability to create interactions between separate dashboards.
At first, I thought could not be correct? I don’t remember seeing this option. But sure enough it’s there. So, I thought I’d write a quick blog about it and share to the world.
You can apply sections or context from one dashboard to another. You can connect widgets and views to widgets and views in the same dashboard or to other dashboards to investigate problems or better analyze the provided information.
First, I’ve created two dashboards, which are based on the old troubleshooting dashboards. Both dashboards have an Object Picker List to filter the various related objects on each dashboard.
Dashboard-1 – Troubleshoot Cluster
Dashboard-2 – Troubleshoot VM
The premise is simple, when I select a Cluster object from Dashboard-1, I want the list of VMs to be filtered in Dashboard-2, to those only in the selected Cluster.
Import the files appended with “view” under the view’s in vROPs
Import the file appended with “Dashboard” under the dashboard section in vROPs.
Dashboard Breakdown
First Item – This is a list which I’ve created to show each cluster, the total VM metric with some expressions attached, the timescale here is fixed by the list view and not affected by the dashboard timeframe. The change is an expression of the count of VMs at the start and end of the timeframe. I’ve added in some basic colouring to alert at thresholds.
Why does it say vCPUs? When using expressions, it requires a Unit to be affixed. This doesn’t work if you’re counting something, so in our next release, this issue should be addressed. It’s purely a vanity thing.
Second Item – This shows the VMs attached to the cluster you select on the left-hand side, showing you how old that VM is, its uptime and current power state.
Third is a Sparkline – Showing an easy view of the changes in total VMs per cluster over a 7-day period (as defined by the dashboards time scale)
Forth item is a trend graph, where we are showing date of the changes in the Total VM metric based on the data we have, and the trend/forecast. This trend into the future is set within the item itself. Currently we can set this to show the forecast for the next 366 days in the future.
vROPs versions
To show the VM creation date, this metric is available in vROPs 8.2 and later. This dashboard/view should work with older versions of vROPs but omit the data for the missing metric.
I had someone query the below metrics, and the answer although easy to assume, is not clearly written down and within vROPs you don’t get a description either, so I thought I’d also publish it, in case any inquisitive minds go googling.
Guest|Page In Rate
The Rate the Guest OS brings memory back from disk to DIMM per second. Basically, the rate of reads going through paging/cache system.
It includes not just swapfile I/O, but cacheable reads as well (double pages/s). A page that was paged out earlier, has to be brought back first before it can be used. This creates performance issue as the application is waiting longer, as disk is much slower than RAM.
The unit is in number of pages, not MB. It’s not possible to convert due to mix use of Large Page (2 MB) and Page (4 KB).
A process can have concurrent mixed usage of Large and non-Large page in Windows. The page size isn’t a system-wide setting that all processes use. The same is likely true for Linux Huge Pages.
Windows
Page Input/sec counter
Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\Pages Input/sec to the value of Memory\\Page Reads/sec to determine the average number of pages read into memory during each read operation.
The opposite of the above. This is not as important as the above. Just because a block of memory is moved to disk that does not mean the application experiences memory problem. In many cases, the page that was moved out is the idle page. Windows does not page out any Large Pages.
Windows
Page Output/sec counter
Pages Output/sec is the rate at which pages are written to disk to free up space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, not code. A high rate of pages output might indicate a memory shortage. Windows writes more pages back to disk to free up space when physical memory is in short supply. This counter shows the number of pages, and can be compared to other counts of pages, without conversion.
Linux
Pages Swapped Out counter
Final notes
Page in/out rate includes pages written/read to/from swap file as well as other system files.
It is important to remember these metrics are populated by pulling the data from the performance counters of the Guest OS, hence the need for VMTools. These metrics should not be confused with virtual machine metrics, which are based on the activity of the VM at the vSphere level. Therefore not taking into account what is going on inside the guest itself.
Thanks to Iwan “E1” Rahbook blog post here for helping me figure this out as well.