Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vpc-cni pod is stuck on terminating state #3216

Open
ronbutbul opened this issue Feb 27, 2025 · 3 comments
Open

vpc-cni pod is stuck on terminating state #3216

ronbutbul opened this issue Feb 27, 2025 · 3 comments

Comments

@ronbutbul
Copy link

Hello everyone,
I have an eks running with karpenter.
im facing a weird issue with my aws-node daemonset.
one of the pods is stuck on terminating state and i see the following logs.

│ aws-node 2025-02-27 13:49:55.200164468 +0000 UTC m=+21445.503019035 write error: write /host/var/log/aws-routed-eni/ipamd.log: n │

more of that, i have an issue that the vpc cni takes a long time to release ip addresses.
maybe both problems are related

@orsenthil
Copy link
Member

│ aws-node 2025-02-27 13:49:55.200164468 +0000 UTC m=+21445.503019035 write error: write /host/var/log/aws-routed-eni/ipamd.log: n │

Are yo using any custom AMI? Could you give more details with version of EKS and how you your cluster setup?

@ronbutbul
Copy link
Author

im not using any custom AMI that i know of.
i have eks 1.27.
this is my aws-node daemonset yaml if it helps:
apiVersion: apps/v1
kind: DaemonSet
metadata:
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: aws-node
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2024-05-21T18:37:59+03:00"
creationTimestamp: null
labels:
k8s-app: aws-node
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate
containers:
- env:
- name: ADDITIONAL_ENI_TAGS
value: '{}'
- name: AWS_VPC_CNI_NODE_PORT_SUPPORT
value: "true"
- name: AWS_VPC_ENI_MTU
value: "9001"
- name: AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER
value: "false"
- name: AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG
value: "false"
- name: AWS_VPC_K8S_CNI_EXTERNALSNAT
value: "false"
- name: AWS_VPC_K8S_CNI_LOGLEVEL
value: DEBUG
- name: AWS_VPC_K8S_CNI_LOG_FILE
value: /host/var/log/aws-routed-eni/ipamd.log
- name: AWS_VPC_K8S_CNI_RANDOMIZESNAT
value: prng
- name: AWS_VPC_K8S_CNI_VETHPREFIX
value: eni
- name: AWS_VPC_K8S_PLUGIN_LOG_FILE
value: /var/log/aws-routed-eni/plugin.log
- name: AWS_VPC_K8S_PLUGIN_LOG_LEVEL
value: DEBUG
- name: DISABLE_INTROSPECTION
value: "false"
- name: DISABLE_METRICS
value: "false"
- name: DISABLE_NETWORK_RESOURCE_PROVISIONING
value: "false"
- name: ENABLE_IPv4
value: "true"
- name: ENABLE_IPv6
value: "false"
- name: ENABLE_POD_ENI
value: "false"
- name: ENABLE_PREFIX_DELEGATION
value: "false"
- name: MY_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: WARM_ENI_TARGET
value: "1"
- name: WARM_IP_TARGET
value: "1"
- name: WARM_PREFIX_TARGET
value: "1"
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.10.4-eksbuild.1
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /app/grpc-health-probe
- -addr=:50051
- -connect-timeout=5s
- -rpc-timeout=5s
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: aws-node
ports:
- containerPort: 61678
hostPort: 61678
name: metrics
protocol: TCP
readinessProbe:
exec:
command:
- /app/grpc-health-probe
- -addr=:50051
- -connect-timeout=5s
- -rpc-timeout=5s
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 25m
securityContext:
capabilities:
add:
- NET_ADMIN
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /host/var/log/aws-routed-eni
name: log-dir
- mountPath: /var/run/aws-node
name: run-dir
- mountPath: /var/run/dockershim.sock
name: dockershim
- mountPath: /run/xtables.lock
name: xtables-lock
dnsPolicy: ClusterFirst
hostNetwork: true
initContainers:
- env:
- name: DISABLE_TCP_EARLY_DEMUX
value: "false"
- name: ENABLE_IPv6
value: "false"
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni-init:v1.10.4-eksbuild.1
imagePullPolicy: IfNotPresent
name: aws-vpc-cni-init
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: aws-node
serviceAccountName: aws-node
terminationGracePeriodSeconds: 10
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /opt/cni/bin
type: ""
name: cni-bin-dir
- hostPath:
path: /etc/cni/net.d
type: ""
name: cni-net-dir
- hostPath:
path: /var/run/dockershim.sock
type: ""
name: dockershim
- hostPath:
path: /run/xtables.lock
type: ""
name: xtables-lock
- hostPath:
path: /var/log/aws-routed-eni
type: DirectoryOrCreate
name: log-dir
- hostPath:
path: /var/run/aws-node
type: DirectoryOrCreate
name: run-dir
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 10%
type: RollingUpdate
status:
currentNumberScheduled: 18
desiredNumberScheduled: 18
numberAvailable: 18
numberMisscheduled: 0
numberReady: 18
observedGeneration: 3
updatedNumberScheduled: 17

@ronbutbul
Copy link
Author

by the way, the fact that the aws-node pod is stuck on terminating, is making every deleted pod on the specific node get stuck as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants