Tanzu Blog Logo Header

Tanzu Service Mesh – First look at deploy, configure and managing an app between clusters

I’ve finally found the time to dive into Tanzu Service Mesh (TSM), a product which has intrigued me greatly. In this blog post I cover setting up TSM, and configuring application connectivity between two clusters running in different public clouds.

What is a Service Mesh?

In software architecture, a service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between services or micro-services, using a proxy. This then provides additional capability such as transit layer between platforms, observability, traffic management and more.

A service mesh provides the capability to secure and simplify your application communications between platforms/clusters/clouds without the need for introduction of lots of other software such as API Gateways.

A service mesh introduces a proxy component which sits side by side with any deployed service. And any inbound and outbound communication flows through the proxy to/from the service. This allows the proxy to provide all the functions needed to offer the capabilities discussed above and in the rest of this blog post.

What is Tanzu Service Mesh (TSM)?

TSM builds upon the principals mentioned above, offering consumers a full enterprise-class service mesh, designed to solve challenges across distributed micro-service applications by extending service mesh services outside Kubernetes clusters and providing a unified operational layer across platforms and technologies.

  • Extends the service mesh capability (discovery, connectivity, control, security, and observability) to users and data, and across clusters and clouds.

  • Facilitates the development and management of distributed applications across multiple clusters, multiple clouds, and in hybrid-cloud environments with Global Namespaces (discussed below), supporting federation across organizational boundaries, technologies, and service meshes.

  • Implements consistent application-layer traffic management and security policies across all your clusters and clouds.

Global Namespace – is a key concept introduced by TSM, that extends service mesh capabilities for resources in a distributed application by arranging these objects in a logical group. A global namespace is therefore not tied to a single cluster or cloud, connecting resources between two or more clusters/clouds. Each global namespace manages service discovery, observability, encryption, policies, and service-level objectives (SLOs) for its objects regardless of where they reside: in multiple clusters, sites, or clouds.

To learn more about the concepts introduced by TSM, please read the below document:

Tanzu Service Mesh - Logical Overview

Encryption for Data in Flight

TSM provides the value of a transit between clusters and clouds. In doing so comes many challenges, one such is achieving end-to-end encryption for in-flight traffic. This can be mandatory to comply with regulatory and security governance measures.

Tanzu Service Mesh includes a top-level certificate authority (CA) to provide a trusted identity to each node on the network. In the case of microservices architecture, those nodes are the pods that run the services. Tanzu Service Mesh can set up end-to-end mutual transport layer security (mTLS) encryption using a CA function.

Activating Tanzu Service Mesh

TSM is a SaaS service only from VMware, meaning no on-premises deployment for the control plane aspect. To activate, you will click the link provided when you request a trial or purchase the product,and linking it to your VMware Cloud Services Organisation (or create a new org if necessary.)

You must belong to an organization before you can access Tanzu Service Mesh. 

If you are an organization owner, you have access to all the resources of the organization. You can add cloud services to your organization and then invite users. 

After you add Tanzu Service Mesh to your organization, you can add users to the organization to give them access to Tanzu Service Mesh. 

For more information about organizations and adding users to an organization, see the VMware Cloud Services documentation.

Once you’ve activated the product, you will see it in your Cloud Services Portal as below.

VMware Cloud Services - Tanzu Service Mesh

Pre-requisites for Installation

This is something which caused me issues in the past, where I tried to deploy TSM to a cluster that did not have enough resources to begin with.

- Cluster with 6 nodes (though the documentation does say minimum of 3 lower down in the text)
> I've queried this

- Requires that every node in the cluster have at least 250 milliCPU and 650 MB of memory available

- Pod Limits - Requires a quota of at least 3 pods for each node on the cluster and additionally at least 30 pods for the whole cluster.

Basically, you need to make sure you have enough minimum nodes and resources.

Regarding supported Kubernetes distributions, the team validate against a number of Enterprise Kubernetes solutions as well as the upstream Kubernetes.

I deployed two new environments for my testing, an EKS and AKS cluster, with the following commands.

eksctl create cluster --name veducate-eks-tsm --region us-west-1 --nodegroup-name standard --node-type m5.xlarge --managed

az aks create -rg aks-tsm --name aks-tsm-cluster --node-count 4 --ssh-key $HOME/.ssh/id_rsa.pub -s standard_d4s_v5
Onboarding your Clusters into Tanzu Service Mesh

Onboarding involves registering the cluster with Tanzu Service Mesh and installing the necessary software on the cluster.

During onboarding, you can choose to install Tanzu Service Mesh in all the namespaces on the cluster or choose to not install Tanzu Service Mesh on specific namespaces by defining namespace exclusions. You exclude a namespace if you don’t want to enable the service mesh capabilities in that namespace. The system namespaces on the cluster, such as kube-system, kube-public, and istio-system, are excluded from Tanzu Service Mesh by default.

You can define exclusions for namespaces that do not yet exist in the cluster but that you plan to create.

Tanzu Service Mesh - New Workflow - Onboard a new cluster

  • Enter a name for the cluster as it should appear in the TSM inventory
  • Click to Generate the installation links
  • In step 2, copy and paste the registration YAML to the cluster, and then add the token configuration.

Once you’ve added these configurations, TSM will wait for the cluster to create a secure connection back to the TSM control plane SaaS service.

Tanzu Service Mesh - Onboard a Cluster

  • When configuring the cluster, you can configure which namespaces TSM will apply to. In the below image, you can see this as I add the second cluster. (It can be modified later as well).

Tanzu Service Mesh - New Workflow - Onboard a new cluster - Namespace configuration

On the home screen, under Cluster Overview, you will see your attached clusters.

Tanzu Service Mesh - Home - Cluster Overview

Selecting a cluster will show further information, as well as the ability to change the friendly display name and namespace exclusion rules.

Tanzu Service Mesh - Inventory - Cluster

Deploying a test application across clusters

I won’t bore you with all the in-depth details, but I wanted to show what happens when you deploy an application across clusters. As this is, of course, one of the most important features with TSM. The ability to connect disparate clusters together via a mesh, so that an application can be deployed across clusters for locality and resilience.

For this part of the blog, I am using VMware’s ACME sports store app.

Cluster 1 will run all the front-end services, Cluster 2 will run the backend catalog database that contains the details and images for the products for sale.

In Cluster 1, the application will reference the Global Namespace domain (created in the new section) to be able to access the catalog service which is located in Cluster 2.

# Ensure the namespace you are deploying into is enabled for istio injection
kubectl label namespace default istio-injection=enabled

# Clone the repo
git clone https://github.com/vmwarecloudadvocacy/acme_fitness_demo -b dkalani-dev3

## Cluster 1 configuration (EKS)
# Deploy the istio gateway
kubectl apply -f acme_fitness_demo/istio-manifests/gateway.yaml

# Deploy the secrets used by the application
kubectl apply -f acme_fitness_demo/kubernetes-manifests/secrets.yaml

# Edit the following manifest file, change the ENV key: CATALOG_HOST value, to match a name you will use for your Global namespace
vim acme_fitness_demo/kubernetes-manifests/acme_fitness_cluster1.yaml

# Deploy the cluster services
kubectl apply -f acme_fitness_demo/kubernetes-manifests/acme_fitness_cluster1.yaml

## Accessing the application, you need to get the address of the Istio Gateway
kubectl get services -n istio-system | grep istio-ingressgateway

## Navigate using a browser to http://<ingress IP>, you should see the Acme store but without pictures of products (we haven’t connected the catalog service yet)

We will now move onto the configuration of the second cluster (AKS).

# Deploy the Istio gateways
kubectl apply -f acme_fitness_demo/istio-manifests/gateway.yaml 

# Deploy the secrets
kubectl apply -f acme_fitness_demo/kubernetes-manifests/secrets.yaml

# Deploy the cluster2 services
kubectl apply -f acme_fitness_demo/kubernetes-manifests/acme_fitness_cluster2.yaml
Create the Global Namespace

Click the New Workflow option on the left-hand navigation window, or the button if you are already viewing the global namespaces UI page.

Tanzu Service Mesh New Global Namespace

  • Provide a name for the global namespace
  • Provide a Domain to be used across the clusters
    • In my example I used sample.app.com
    • .local is not a supported domain suffix

Tanzu Service Mesh - New Global Namespace - General Details

  • Map your services together by selecting the clusters and namespaces where your app is located.
  • The service preview will show you the deployments that are configured in the namespace
  • The “Add service mappings” will allow you to continue to add multiple clusters.

Tanzu Service Mesh - New Global Namespace - Service Mapping

Tanzu Service Mesh - New Global Namespace - Public Services

Tanzu Service Mesh - New Global Namespace - GSLB & Resiliency

  • Review the configuration summary, then click finish.

Tanzu Service Mesh - New Global Namespace - Configuration Summary

Select the new global namespace.

Tanzu Service Mesh - New Global Namespace created

After a few minutes you will see the GNS Topology map built up, including showing traffic paths between the applications and services across the clusters. You can read more about this view below:

Tanzu Service Mesh - Global Namespace - GNS Topology

You can also select each service displayed and interrogate it for further information.

Tanzu Service Mesh - Global Namespace - GNS Topology - Service view

Deploying the Pac-Man App

I decided to deploy my own favourite demo app, to understand the configuration a little more.

git clone https://github.com/saintdle/pacman-tanzu.git

I created a new gateway and virtual service for the Istio component of TSM, where needed matching the labels to my pacman deployment. These were deployed into the namespace of my app (the previous example was all done on the default namespace).

This is deployed to both clusters.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: pacman-gateway
  namespace: pacman
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: pacman
  namespace: pacman
spec:
  hosts:
  - "*"
  gateways:
  - pacman-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: pacman
        port:
          number: 8080

In Cluster 1 (EKS) which will host the front end I deploy the following files:

  • pacman-tanzu/deployments/pacman-deployment.yaml
    • Within this file I edit the Mongo host location to match my GNS I will deploy:
        env:
        - name: MONGO_SERVICE_HOST
          value: mongo.pacman.svc
  • pacman-tanzu/services/pacman-service.yaml
    •  Change the service to ClusterIP
  • pacman-tanzu/security/rbac.yaml
  • pacman-tanzu/security/secret.yaml

Within Cluster 2 (AKS) which hosts the Database backend, I deploy the following files:

  • pacman-tanzu/deployments/mongo-deployment.yaml
  • pacman-tanzu/persistentvolumeclaim/mongo-pvc.yaml
  • pacman-tanzu/services/mongo-service.yaml
  • pacman-tanzu/security/rbac.yaml
  • pacman-tanzu/security/secret.yaml

Now to create my GNS I follow similar steps as above; however I select the “pacman” namespace, and use the GNS domain pacman.svc.

Once everything is deployed you get access to the same as in the above example.

One issue I came across, is that the topology view doesn’t show the same connectivity flow between clusters. This is because currently the TSM topology mapping component only covers HTTP/s traffic. As my connectivity is between Web front end directly to the DB on port TCP 27017, this doesn’t show.

This only affects the visual view but does not affect any of the features or implementation. My Web Front End can still communicate with the DB across clusters.

Tanzu Service Mesh - Global Namespace - GNS Topology - Pacman

 

Determine the health of the components on a cluster

Tanzu Service Mesh Lifecycle Manager Operator automatically manages the Tanzu Service Mesh components on onboarded clusters. You can use Tanzu Service Mesh Lifecycle Manager Operator to determine the health of the components on a cluster.

On your given cluster you can run the following command

kubectl --context {cluster_name} -n vmware-system-tsm get tsmclusters.tsm.vmware.com tsm-client-cluster -o yaml

# Below is an example output from my EKS cluster
apiVersion: tsm.vmware.com/v1
kind: TsmCluster
metadata:
  creationTimestamp: "2022-01-25T07:56:32Z"
  generation: 5
  name: tsm-client-cluster
  namespace: vmware-system-tsm
  resourceVersion: "8085"
  uid: 4701d094-5dc7-4545-ad85-1738b2deeda7
spec:
  uuid: ""
  version: v1.13.8
status:
  components:
  - healthState: Healthy
    kind: Deployment
    name: k8s-cluster-manager
    previousStates:
    - lastUpdateTime: "2022-01-25T07:56:39Z"
      state: ""
    previousVersion: ""
    size: 1
    state: OK
    version: v4.2.0-1
  - healthState: Healthy
    kind: Deployment
    name: allspark-ws-proxy
    previousStates:
    - lastUpdateTime: "2022-01-25T07:56:39Z"
      state: ""
    previousVersion: ""
    size: 1
    state: OK
    version: v3.2.0
  - healthState: Unknown
    kind: Job
    name: kickstarter
    previousStates:
    - lastUpdateTime: "2022-01-25T07:56:39Z"
      state: ""
    previousVersion: ""
    size: 0
    state: OK
    version: 1a4ccab4
  currentVersion: v1.13.8
  previousVersion: ""
  state: OK
Troubleshooting

The usual troubleshooting applies when it comes to Kubernetes, use the “kubectl” command line and verify your configurations. And of course, TSM gives you a fantastic way to visualise your configurations and connectivity.

As TSM layers ontop of Istio, you can also use the “istioctl” command line tool as well. But read the warning below.

Using the istioctl tool to modify or configure TSM deployment will not be supported in any way, as this is outside of the control-plane. Don't do it!

The single reason why I mentioned this tool, is due to the “analyze” command, which may assist you when you make mistakes deploying a gateway or virtual service such as this example:

> istioctl analyze
Info [IST0118] (Service default/loadgen) Port name loadgen (port: 80, targetPort: 8089) doesn't follow the naming convention of Istio port.
Error: Analyzers found issues when analyzing namespace: default.
See https://istio.io/v1.12/docs/reference/config/analysis for more information about causes and resolutions.
Wrap-up and Resources

Finally finding time to test out TSM was fun. I think it really started to make sense to me once I created essentially an encrypted backbone between to completely separate clusters deployed into two completely different clouds (AWS and Azure).

Then through the global namespace, I could split my application deployments between both clusters but connect them together and use them as if they operated within a single cluster.

And this is just the tip of the iceberg for the capabilities of TSM as well.

 

Regards

Dean Lewis

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.