aws_eks: Error creating `FargateCluster` in `cn-north-1` due to `CoreDnsComputeTypePatch` creation error #26613

jlubins · 2023-08-02T21:38:22Z

Describe the bug

Towards the end of a FargateCluster deployment, several resources fail to create, resulting in a rollback/delete.

Expected Behavior

I expect the cluster to be created smoothly, as I believe it is supported in this region and has successfully deployed on us-east-1 with the same configuration.

Current Behavior

When creating a resource with a logical ID k8sclusterCoreDnsComputeTypePatch2EEF5C89, it fails with the following status reason:

CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [4af278ec-eb20-4abc-8d38-4e76661d6112]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

Create for the last remaining necessary resources also fails because this one fails.

Reproduction Steps

Cluster creation code:

cluster = eks.FargateCluster(
            self,
            "k8s-cluster",
            cluster_name=f"k8s-{stage_name}",
            version=eks.KubernetesVersion.V1_26,
            vpc=self.vpc,
            vpc_subnets=[subnet_selection],
            cluster_logging=[
                eks.ClusterLoggingTypes.API,
                eks.ClusterLoggingTypes.AUTHENTICATOR,
                eks.ClusterLoggingTypes.SCHEDULER,
            ],
            kubectl_layer=lambda_layer_kubectl_v26.KubectlV26Layer(
                self, "kubectl-v26-layer"
            ),
            masters_role=masters_role,
        )

Possible Solution

Possibly trying to apply a patch that requires Global internet access, but needs to use a mirror in China? Other than that, not sure why something in China would fail.

Additional Information/Context

No response

CDK CLI Version

2.86.0

Framework Version

No response

Node.js Version

18.30

OS

Mac OS X

Language

Python

Language Version

3.9.15

Other information

No response

The text was updated successfully, but these errors were encountered:

pahud · 2023-08-03T00:23:45Z

Yes I can reproduce this at cn-north-1 but I can't figure out the root cause off the top of my head.

Making this a p1. Will update here if I find anything.

pahud · 2023-08-03T00:27:52Z

Looks like it's failing to install this in cn-north-1

https://github.com/aws/aws-cdk/blob/300989a675bd9fc9c2829c5115efe34e753e0976/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L2019C20-L2025

ref: https://docs.aws.amazon.com/eks/latest/userguide/fargate-getting-started.html#fargate-gs-coredns

jlubins · 2023-08-05T20:43:18Z

I think I figured out the reason, or at least the solution. When I create this cluster, I use a subnet selection that includes the availability zones that have EKS capacity, cn-north-1a and cn-north-1b. I had a couple of completely private subnets (no VPC endpoints) in those AZs. To create the cluster, CDK creates an ENI that uses the fully private subnets, and I'm guessing the ClusterAwsAuthmanifest and ClusterCoreDNSComputeTypePatch need internet access, as they seem to time out.

When I change the subnet selection to explicitly be subnets that have VPC endpoints, I am able to finish creating the cluster without a problem.

If these patches are indeed requiring internet access, would it be possible to give a warning at synth time stating that the selected subnets may not be suitable? I believe I've seen warnings like this before when creating another resource that I passed subnets into. Or otherwise, documenting that somewhere would be helpful.

caretak3r · 2023-11-14T14:57:15Z

+1
I am having this issue with private subnets in us-east-1/us-west-2. Likely not a region specific issue, but because we use private subnets with VPC Endpoints for most services, this is still not working...

Howlla · 2023-11-14T15:16:30Z

I agree this is not region specific. I think currently coreDNS on EKS fargate needs a public subnet to be able to patch it. This is a bug which should be fixed

Temmy-dev · 2024-05-07T07:37:39Z

I agree that it is not region specific and I support the idea from @Howlla . I created a natgateway in the public subnet of the vpc, adjusted my route table to point to natgateway-id, then I was able to avoid this error. This might help someone too.

jlubins added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 2, 2023

github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Aug 2, 2023

pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Aug 2, 2023

pahud self-assigned this Aug 2, 2023

pahud added p1 effort/medium Medium work item – several days of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2023

pahud removed their assignment Aug 3, 2023

github-actions bot mentioned this issue Sep 1, 2023

Monthly PRs metrics report #26964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_eks: Error creating `FargateCluster` in `cn-north-1` due to `CoreDnsComputeTypePatch` creation error #26613

aws_eks: Error creating `FargateCluster` in `cn-north-1` due to `CoreDnsComputeTypePatch` creation error #26613

jlubins commented Aug 2, 2023

pahud commented Aug 3, 2023 •

edited

Loading

pahud commented Aug 3, 2023 •

edited

Loading

jlubins commented Aug 5, 2023

caretak3r commented Nov 14, 2023

Howlla commented Nov 14, 2023

Temmy-dev commented May 7, 2024 •

edited

Loading

aws_eks: Error creating FargateCluster in cn-north-1 due to CoreDnsComputeTypePatch creation error #26613

aws_eks: Error creating FargateCluster in cn-north-1 due to CoreDnsComputeTypePatch creation error #26613

Comments

jlubins commented Aug 2, 2023

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

pahud commented Aug 3, 2023 • edited Loading

pahud commented Aug 3, 2023 • edited Loading

jlubins commented Aug 5, 2023

caretak3r commented Nov 14, 2023

Howlla commented Nov 14, 2023

Temmy-dev commented May 7, 2024 • edited Loading

aws_eks: Error creating `FargateCluster` in `cn-north-1` due to `CoreDnsComputeTypePatch` creation error #26613

aws_eks: Error creating `FargateCluster` in `cn-north-1` due to `CoreDnsComputeTypePatch` creation error #26613

pahud commented Aug 3, 2023 •

edited

Loading

pahud commented Aug 3, 2023 •

edited

Loading

Temmy-dev commented May 7, 2024 •

edited

Loading