Tag Archives: Observability

cilium header

Cilium Network Policies, from first principles to production

This post teaches the Cilium policy model with clear scenarios and annotated YAML. It matches the style of practical technical blogs, explanation first and code second, with links to the official docs where you will want deeper detail.

Why Cilium policy

Kubernetes’ built-in NetworkPolicy objects define which pods can communicate using label-based rules at the IP and port level. This provides basic isolation, but it stops short of deeper visibility or intent-based control.

Cilium builds on this foundation by introducing security identities derived from labels. These identities represent workloads consistently across nodes and are enforced directly in the kernel using eBPF. Because enforcement happens in the datapath, policies remain accurate and efficient even as workloads scale or IPs change.

Beyond IP and port filtering, Cilium understands application context such as DNS names and HTTP methods and paths. This makes it possible to express policies in human terms — for example, “allow only GET requests on /health from pods with role=frontend” or “allow egress only to api.partner.com.”

Together, these capabilities create a single, consistent model for enforcing and observing network behavior across all workloads. This post walks through that model step by step, with practical YAML examples you can apply to your own environment.

For further reading, see the official Cilium policy overview for the complete language reference and selector options.

Mental model

Every policy answers four things. Where to enforce, which direction to guard, who may talk, and whether to apply checks at the application layer.

These are the top things to keep in mind when defining a Cilium Network Policy.

  • Subject, choose pods with endpointSelector or nodes with nodeSelector
  • Direction, if a selected subject has an ingress list then ingress becomes default deny for that subject, the same idea applies for egress
  • Peers, choose with fromEndpoints, toEndpoints, toEntities, toCIDRSet, toFQDNs, toServices
  • Application layer, add optional rules: under toPorts for HTTP or DNS

Language details are in the Cilium policy language.

We typically refer to the security policies implemented in Cilium holistically as “Cilium Network Policy”. However when you dive into using them in your platform, you will find there is in fact two types of policy configuration to be aware of. Essentially most of the information in this post is true for both types. But just keep in mind the following;

  • CiliumNetworkPolicy (CNP) is the namespaced policy object you apply to control traffic for pods within a single namespace.
    CiliumClusterwideNetworkPolicy (CCNP) is the cluster-scoped version. It uses the same language and selectors but applies across all namespaces, which is useful for node policies, global DNS interception, or rules that span multiple teams.

What a Cilium endpoint is

Every pod (and any process that Cilium manages traffic for) is represented inside Cilium as an endpoint. An endpoint is essentially Cilium’s view of a workload: its labels, Security Identity, policies, and network state.

When you write a policy with an endpointSelector, you’re telling Cilium “apply this rule to the endpoints whose labels match this selector.” Cilium uses that to program the eBPF datapath on the node where each endpoint lives.

You can see endpoints on a node with:

kubectl -n kube-system exec -ti ds/cilium -- cilium endpoint list
Each row in the table is one Cilium endpoint. An endpoint represents a pod or other workload that Cilium is managing on that node. The columns tell you at a glance what Cilium knows about it and how policies apply.
  • ENDPOINT: This is Cilium’s internal endpoint ID on that node. In this case, 24. You’ll use this ID if you run cilium endpoint get 24 for detailed info.

  • POLICY (ingress / egress): Shows whether policy enforcement is active on this endpoint. “Disabled” means there are no Cilium policies selecting it yet for that direction, so all traffic is allowed. Once you create a CiliumNetworkPolicy with an ingress or egress section matching this endpoint, this field will flip to “Enabled”.

  • IDENTITY: The numeric Security Identity assigned to the set of identity-relevant labels for this endpoint (69014 here). Cilium uses this number in the datapath to represent the workload.

  • LABELS (source:key=value): The full list of labels that Cilium knows for this endpoint. The prefix shows where the label came from (k8s: means Kubernetes label). These are the labels you match on in your policies. In the example, it includes app=minio, the namespace, the service account, and some Helm-related labels.

  • IPv4 / IPv6: The IP addresses currently assigned to that pod. Notice you never use them directly in your policies; Cilium maps them to the Security Identity automatically. Note: there is the ability to specify CIDR-based filtering in a Cilium Network Policy as well, but this is recommended not to be used to for filtering when it comes to Pod traffic inside the cluster.

  • STATUS: Shows the endpoint’s state from Cilium’s perspective (“ready” means it’s healthy and being managed).

ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                                IPv6   IPv4         STATUS   
           ENFORCEMENT        ENFORCEMENT                                                                                                                                 
24         Disabled           Disabled          69014      k8s:app.kubernetes.io/managed-by=Helm                                                             10.0.0.247   ready   
                                                           k8s:app=minio                                                                                                          
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=minio                                                   
                                                           k8s:io.cilium.k8s.policy.cluster=kind                                                                                  
                                                           k8s:io.cilium.k8s.policy.serviceaccount=quickstart-sa                                                                  
                                                           k8s:io.kubernetes.pod.namespace=minio                                                                                  
                                                           k8s:v1.min.io/console=quickstart-console                                                                               
                                                           k8s:v1.min.io/pool=ss-0

This view is invaluable when troubleshooting, and we’ll cover this towards the end of the blog post.

Labels and Security Identity

Continue reading Cilium Network Policies, from first principles to production

Kubernetes Header Image

Highlight Kubernetes Labels in your Terminal with AWK

A quick tip and bit of code: if you’re outputting a lot of Kubernetes metadata using the --show-labels command, it can feel like looking for a needle in a haystack. The snippet below colorizes key label outputs to make them stand out.

The Code Snippet

When working with Kubernetes, it can be helpful to visually scan for certain node labels—such as service.cilium.io/node=... or custom readiness flags like ingress-ready=true. Using a simple awk script, we can colorize these labels directly in our terminal output. This script uses ANSI escape codes to wrap matched text in color and awk’s gsub() function to apply substitutions line by line. It’s a lightweight and effective way to highlight key data points in otherwise dense CLI output.

kubectl get ciliumnodes --show-labels | awk '
BEGIN {
  color_start = "\033[1;36m"; # cyan
  color_end = "\033[0m";
}
{
  gsub(/service\.cilium\.io\/node=[^, ]+/, color_start "&" color_end);
  gsub(/ingress-ready=true/, color_start "&" color_end);
  print
}'

Screenshot Example


Screenshot showing the use of an awk command to color-highlight the ingress-ready=true label in red within kubectl get ciliumnodes --show-labels output in a Kubernetes terminal session.

Breakdown of the Code

We pipe the output of the kubectl command to awk. The BEGIN block sets up the ANSI color codes used for matching patterns.

  • \033[1;36m is an ANSI escape code that starts cyan-colored text.
  • \033[0m resets the text color back to normal.

gsub(...)

These two lines apply substitutions to each input line:

  • gsub() is a global substitution function that replaces all matches in the line.
    • service\.cilium\.io\/node=[^, ]+ matches a full key-value pair like service.cilium.io/node=mynode
    • [^, ]+ grabs the node value until the next comma or space
    • ingress-ready=true matches the exact label string
    • & refers to the entire matched string, which we wrap in color codes

print

This prints the modified line after substitutions are applied.

Customize the Highlight Color

You can change \033[1;36m to another color code:

  • Red: \033[1;31m
  • Green: \033[1;32m
  • Yellow: \033[1;33m
  • Blue: \033[1;34m
  • Magenta: \033[1;35m

A Final Note on sub() vs gsub()

  • sub() replaces only the first occurrence of the regex in the line
  • gsub() replaces all occurrences of the regex in the line

Regards


Bluesky Icon
Follow me on Bluesky

Dean Lewis

Tanzu Observability Header

Tanzu Observability – First look at monitoring OpenShift & VMware Cloud on AWS

Recently, I was involved in some work to assist the VMware Tanzu Observability team to assist them in updating their deliverables for OpenShift. Now it’s generally available, I found some time to test it out in my lab.

For this blog post, I am going to pull in metrics from my VMware Cloud on AWS environment and the Red Hat OpenShift Cluster which is deployed upon it.

What is Tanzu Observability?

We should probably start with what is Observability, I could re-create the wheel, but instead VMware has you covered with this helpful page.

Below is the shortened table comparison.

Monitoring vs. Observability

As a developer you want to focus on developing the application, but you also do need to understand the rest of the stack to a point. In the middle, you have a Site Reliability Engineer (SRE), who covers the platform itself, and availability to ensure the app runs as best it can. And finally, we have the platform owner, where the applications and other services are located.

Somewhere in the middle, when it comes to tooling, you need to cover an example of the areas listed below:

  • Application Observability & Root Cause Analysis
    • App-aware Troubleshooting & Root Cause Analysis
  • Distributed Tracing
  • CI/CD Monitoring
  • Analytics with Query Language and high reliability, granularity, cardinality, and retention
  • Full-Stack Apps & Infra Telemetry as a Service
  • Infra Monitoring
    • Performance Optimization
    • Capacity and Cost Optimization
    • Configuration and Compliance

So now you are thinking, OK, but VMware has vRealize Operations that gives me a lot of data, so why is there a new product for this?

vRealize Operations and Tanzu Observability come together – delivering full stack monitoring and observability from both the infra-up and app-down perspective, equipping both teams in the org to meet common goals.

Monitoring & Observability

It is about the right tool for the right team and bringing together harmony between them. Which is why at VMware, the focus has been on covering the needs of team across the two products.

vRealize Operations is going to give you SLA metrics for your infrastructure and even application awareness. However Tanzu Observability brings more application focused features to allow you as a business, report on Application Experience of your end users/customers, at an SLA/SLO/KPI approach with extensibility to provide an Experience Level Agreement (XLA) type capability.

VMware Tanzu Observability by Wavefront delivers enterprise-grade observability and analytics at scale. Monitor everything from full-stack applications to cloud infrastructures with metrics, traces, event logs, and analytics.

High level features include:

To follow this blog, you can also easily get yourself access to Tanzu Observability.

Configuring data ingestion into Tanzu Observability using the native integrations

Configuring the OpenShift (Kubernetes) Integration using Helm

First, we need to create an API Key that we can use to connect our locally deployed wavefront services to the SaaS service to send data. Continue reading Tanzu Observability – First look at monitoring OpenShift & VMware Cloud on AWS