VMware Tanzu Header

Deploying Tanzu Kubernetes Grid to AWS fails with ‘InstanceProvisionFailed’

The issue

When deploying Tanzu Kubernetes Grid to AWS, the deployment was failing with the following output:

unable to set up management cluster, : unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'InstanceProvisionFailed @ Machine/tkg-aws-mgmt-control-plane-dqb4v', message:'1 of 2 completed'
The Cause

When we reviewed the CAPA logs (Cluster API AWS provider) we found the following errors logged:

controllers/AWSMachine "msg"="Failed to create AWS Secret entry" "error"="AccessDeniedException: User: arn:aws:iam::036776340102:user/veducate-tkg is not authorized to perform: secretsmanager:CreateSecret on resource: aws.cluster.x-k8s.io/23e9ea17-e1b3-4609-85f2-62b2b9044d94-0\n\tstatus code: 400, request id: de94dd21-6dfb-4387-ac76-9983f4301f97" "awsMachine"="tkg-clusters-control-plane-2lbjv" "cluster"="tkg-clusters" "machine"="tkg-clusters-control-plane-p4fmt" "namespace"="tkg-system" "secretPrefix"="aws.cluster.x-k8s.io/23e9ea17-e1b3-4609-85f2-62b2b9044d94"

You can access these logs by running the following against your Kind cluster running in docker.

kubectl --kubeconfig {kubeconfig_file} logs -n capa-system                         deployment.apps/capa-controller-manager -c manager

The kubeconfig file will be shown in your deployment output.

TKG AWS ARN Access - kubeconfig location

If we check the ARN in the AWS Portal, I can see their not a permission for AWS Secrets Manager.

The pre-reqs for the AWS ARN setup are detailed here. When you first deploy a management cluster to AWS, it will create a CloudFormation Stack to provision the correct roles and permissions against your ARN.

However if the account you are using to connect to AWS does not have the right permissions, then you can end up with odd behaviour like this.

Your AWS account must have at least the following permissions:

TKG AWS ARN Access

The Fix

It is probably you do not currently have access to do this as part of a shared AWS Organisation or Account. So once you’ve resolved this, manually add the permission to the ARN, I’ve also detailed where this is created by the TKG installer below.

Alternatively you could run the delete cluster command, then re-run from scratch now your account has the correct permissions to assign the Secrets Manager permission.

TKG AWS ARN Access - Add Secrets Manager

In this document, the Secrets Manager is mapped in nodes.tkg.cloud.vmware.com IAM policy.

{
            "Action": [
                "secretsmanager:DeleteSecret",
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
            ],
            "Effect": "Allow"
        },

Regards

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.