The issue
When deploying Tanzu Kubernetes Grid to AWS, the deployment was failing with the following output:
unable to set up management cluster, : unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'InstanceProvisionFailed @ Machine/tkg-aws-mgmt-control-plane-dqb4v', message:'1 of 2 completed'
The Cause
When we reviewed the CAPA logs (Cluster API AWS provider) we found the following errors logged:
controllers/AWSMachine "msg"="Failed to create AWS Secret entry" "error"="AccessDeniedException: User: arn:aws:iam::036776340102:user/veducate-tkg is not authorized to perform: secretsmanager:CreateSecret on resource: aws.cluster.x-k8s.io/23e9ea17-e1b3-4609-85f2-62b2b9044d94-0\n\tstatus code: 400, request id: de94dd21-6dfb-4387-ac76-9983f4301f97" "awsMachine"="tkg-clusters-control-plane-2lbjv" "cluster"="tkg-clusters" "machine"="tkg-clusters-control-plane-p4fmt" "namespace"="tkg-system" "secretPrefix"="aws.cluster.x-k8s.io/23e9ea17-e1b3-4609-85f2-62b2b9044d94"
You can access these logs by running the following against your Kind cluster running in docker.
kubectl --kubeconfig {kubeconfig_file} logs -n capa-system deployment.apps/capa-controller-manager -c manager
The kubeconfig file will be shown in your deployment output.
If we check the ARN in the AWS Portal, I can see their not a permission for AWS Secrets Manager.
The pre-reqs for the AWS ARN setup are detailed here. When you first deploy a management cluster to AWS, it will create a CloudFormation Stack to provision the correct roles and permissions against your ARN.
However if the account you are using to connect to AWS does not have the right permissions, then you can end up with odd behaviour like this.
Your AWS account must have at least the following permissions:
- Required IAM Resources: Tanzu Kubernetes Grid creates these resources when you deploy a management cluster to your AWS account for the first time.
- Required Permissions for
tanzu management-cluster create
: Tanzu Kubernetes Grid uses these permissions when you runtanzu management-cluster create
or deploy your management clusters from the installer interface.
The Fix
It is probably you do not currently have access to do this as part of a shared AWS Organisation or Account. So once you’ve resolved this, manually add the permission to the ARN, I’ve also detailed where this is created by the TKG installer below.
Alternatively you could run the delete cluster command, then re-run from scratch now your account has the correct permissions to assign the Secrets Manager permission.
In this document, the Secrets Manager is mapped in nodes.tkg.cloud.vmware.com
IAM policy.
{
"Action": [
"secretsmanager:DeleteSecret",
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
],
"Effect": "Allow"
},
Regards