Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_eks: Error creating FargateCluster in cn-north-1 due to CoreDnsComputeTypePatch creation error #26613

Open
jlubins opened this issue Aug 2, 2023 · 6 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@jlubins
Copy link

jlubins commented Aug 2, 2023

Describe the bug

Towards the end of a FargateCluster deployment, several resources fail to create, resulting in a rollback/delete.

Expected Behavior

I expect the cluster to be created smoothly, as I believe it is supported in this region and has successfully deployed on us-east-1 with the same configuration.

Current Behavior

When creating a resource with a logical ID k8sclusterCoreDnsComputeTypePatch2EEF5C89, it fails with the following status reason:

CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [4af278ec-eb20-4abc-8d38-4e76661d6112]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

Create for the last remaining necessary resources also fails because this one fails.

Reproduction Steps

Cluster creation code:

cluster = eks.FargateCluster(
            self,
            "k8s-cluster",
            cluster_name=f"k8s-{stage_name}",
            version=eks.KubernetesVersion.V1_26,
            vpc=self.vpc,
            vpc_subnets=[subnet_selection],
            cluster_logging=[
                eks.ClusterLoggingTypes.API,
                eks.ClusterLoggingTypes.AUTHENTICATOR,
                eks.ClusterLoggingTypes.SCHEDULER,
            ],
            kubectl_layer=lambda_layer_kubectl_v26.KubectlV26Layer(
                self, "kubectl-v26-layer"
            ),
            masters_role=masters_role,
        )

Possible Solution

Possibly trying to apply a patch that requires Global internet access, but needs to use a mirror in China? Other than that, not sure why something in China would fail.

Additional Information/Context

No response

CDK CLI Version

2.86.0

Framework Version

No response

Node.js Version

18.30

OS

Mac OS X

Language

Python

Language Version

3.9.15

Other information

No response

@jlubins jlubins added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 2, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Aug 2, 2023
@pahud pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Aug 2, 2023
@pahud pahud self-assigned this Aug 2, 2023
@pahud
Copy link
Contributor

pahud commented Aug 3, 2023

Yes I can reproduce this at cn-north-1 but I can't figure out the root cause off the top of my head.

Making this a p1. Will update here if I find anything.

image

@pahud pahud added p1 effort/medium Medium work item – several days of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2023
@pahud pahud removed their assignment Aug 3, 2023
@jlubins
Copy link
Author

jlubins commented Aug 5, 2023

I think I figured out the reason, or at least the solution. When I create this cluster, I use a subnet selection that includes the availability zones that have EKS capacity, cn-north-1a and cn-north-1b. I had a couple of completely private subnets (no VPC endpoints) in those AZs. To create the cluster, CDK creates an ENI that uses the fully private subnets, and I'm guessing the ClusterAwsAuthmanifest and ClusterCoreDNSComputeTypePatch need internet access, as they seem to time out.

When I change the subnet selection to explicitly be subnets that have VPC endpoints, I am able to finish creating the cluster without a problem.

If these patches are indeed requiring internet access, would it be possible to give a warning at synth time stating that the selected subnets may not be suitable? I believe I've seen warnings like this before when creating another resource that I passed subnets into. Or otherwise, documenting that somewhere would be helpful.

@caretak3r
Copy link

+1
I am having this issue with private subnets in us-east-1/us-west-2. Likely not a region specific issue, but because we use private subnets with VPC Endpoints for most services, this is still not working...

@Howlla
Copy link

Howlla commented Nov 14, 2023

I agree this is not region specific. I think currently coreDNS on EKS fargate needs a public subnet to be able to patch it. This is a bug which should be fixed

@Temmy-dev
Copy link

Temmy-dev commented May 7, 2024

I agree that it is not region specific and I support the idea from @Howlla . I created a natgateway in the public subnet of the vpc, adjusted my route table to point to natgateway-id, then I was able to avoid this error. This might help someone too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

No branches or pull requests

5 participants