Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When filtering etcd or api-server pods, the readiness status of the pods was not checked. #3021

Closed
yxxhero opened this issue Feb 12, 2024 · 8 comments · Fixed by kubernetes/kubernetes#123489
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Milestone

Comments

@yxxhero
Copy link
Member

yxxhero commented Feb 12, 2024

What keywords did you search in kubeadm issues before filing this one?

If you have found any duplicates, you should instead reply there and close this page.

If you have not found any duplicates, delete this section and continue on.

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST
BUG

Versions

kubeadm version (use kubeadm version): all

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Container runtime (CRI) (e.g. containerd, cri-o):
  • Container networking plugin (CNI) (e.g. Calico, Cilium):
  • Others:

What happened?

When filtering etcd or api-server pods, the readiness status of the pods was not checked.

What you expected to happen?

When filtering etcd or api-server pods, the readiness status of the pods was checked.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

@neolit123
Copy link
Member

you need to provide more information.

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 12, 2024
@neolit123 neolit123 added this to the v1.30 milestone Feb 12, 2024
@yxxhero
Copy link
Member Author

yxxhero commented Feb 12, 2024

func getRawEtcdEndpointsFromPodAnnotationWithoutRetry(client clientset.Interface) ([]string, int, error) {
	klog.V(3).Infof("retrieving etcd endpoints from %q annotation in etcd Pods", constants.EtcdAdvertiseClientUrlsAnnotationKey)
	podList, err := client.CoreV1().Pods(metav1.NamespaceSystem).List(
		context.TODO(),
		metav1.ListOptions{
			LabelSelector: fmt.Sprintf("component=%s,tier=%s", constants.Etcd, constants.ControlPlaneTier),
		},
	)
	if err != nil {
		return []string{}, 0, err
	}
	etcdEndpoints := []string{}
	for _, pod := range podList.Items {
		etcdEndpoint, ok := pod.ObjectMeta.Annotations[constants.EtcdAdvertiseClientUrlsAnnotationKey]
		if !ok {
			klog.V(3).Infof("etcd Pod %q is missing the %q annotation; cannot infer etcd advertise client URL using the Pod annotation", pod.ObjectMeta.Name, constants.EtcdAdvertiseClientUrlsAnnotationKey)
			continue
		}
		etcdEndpoints = append(etcdEndpoints, etcdEndpoint)
	}
	return etcdEndpoints, len(podList.Items), nil
}

we can see the code above. it doesn't check the ready status of pod. @neolit123

@neolit123
Copy link
Member

we can see the code above. it doesn't check the ready status of pod. @neolit123

why is it a problem that we need to fix?

@yxxhero
Copy link
Member Author

yxxhero commented Feb 12, 2024

etcd-node001                                      0/1     CrashLoopBackOff    18074 (3m20s ago)   4d16h   1.1.1.1        node001     <none>           <none>
etcd-node002                                      1/1     Running             12 (6d3h ago)       69d     1.1.1.1       node002     <none>           <none>
etcd-node005                                      1/1     Running             10 (8d ago)         69d     1.1.1.1       node005     <none>           <none>
etcd-node003                                      1/1     Running             8 (19d ago)         67d     1.1.1.1       node003     <none>           <none>
etcd-node004                                      1/1     Running             7 (60d ago)         67d     1.1.1.1        node004     <none>           <none>

see the above. shall we add the unhealth pod as etcd node? @neolit123 thanks so much.

@neolit123
Copy link
Member

i think getRawEtcdEndpointsFromPodAnnotationWithoutRetry is supposed to get the annotations even if the pod is not ready. readyness is checked during upgrade.

are you seeing this as an actual problem on your clusters?

@yxxhero
Copy link
Member Author

yxxhero commented Feb 13, 2024

@neolit123 make sense. maybe we can log something to nitice user?

@neolit123
Copy link
Member

we could log with klog.V(3).Infof if the Pod is not ready. to my understanding this is what you'd like to be changed?

@yxxhero
Copy link
Member Author

yxxhero commented Feb 14, 2024

Yeah. It's ok for now. I will do that. Alright? Thanks So much @neolit123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants