Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial prototype for pod health check hook up #4223

Merged
merged 14 commits into from
Jun 5, 2020

Conversation

tejal29
Copy link
Contributor

@tejal29 tejal29 commented May 19, 2020

Fixes: #nnn
Related: Relevant tracking issues, for context
Merge before/after: Dependent or prerequisite PRs

Description
This is first iteration of hooking up pod health check with skaffold. With this change,

  • skaffold depoyment.CheckStatus will query for pods using the label defined in deployment.Template.Labels.
  • deployment.LastReportedStatus will now print deployment and pod statuses.

Some of these changes are already broken in smaller PR

Still a couple of opportunities to make this smaller. ( will follow up)
Please review them the output.

User facing changes (remove if N/A)
yes.

Waiting for deployments to stabilize...
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-app-85dfb77b44-md2dd: creating container leeroy-app
 - deployment/leeroy-web: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-web-6f9f98f98b-64wkk: creating container leeroy-web
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-app-85dfb77b44-md2dd: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-app-85dfb77b44-md2dd: container leeroy-app in error: Back-off pulling image "leeroy-app1"
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
...

Add event stream!


Follow-up Work (remove if N/A)

@balopat
Copy link
Contributor

balopat commented May 20, 2020

For me when I have the wrong image in the pod spec "waiting to start, image can't be pulled" is not appearing just "creating container xyz":

Waiting for deployments to stabilize...
 - deployment/leeroy-app: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/leeroy-app-54cd96c94d-m7jgr: creating container leeroy-app
 - deployment/leeroy-web: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/leeroy-web-594b7fcd58-h7sxd: creating container leeroy-web
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]

And then it times out.

@tejal29 tejal29 marked this pull request as ready for review May 26, 2020 19:02
@tejal29
Copy link
Contributor Author

tejal29 commented May 26, 2020

For me when I have the wrong image in the pod spec "waiting to start, image can't be pulled" is not appearing just "creating container xyz":

Waiting for deployments to stabilize...
 - deployment/leeroy-app: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/leeroy-app-54cd96c94d-m7jgr: creating container leeroy-app
 - deployment/leeroy-web: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/leeroy-web-594b7fcd58-h7sxd: creating container leeroy-web
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]

And then it times out.

Sorry, just saw this now. I was able to see the error you mentioned,
#4223 (comment)

Let me rebase this branch from master and report back.

@codecov
Copy link

codecov bot commented May 27, 2020

Codecov Report

Merging #4223 into master will decrease coverage by 0.83%.
The diff coverage is 60.00%.

Impacted Files Coverage Δ
pkg/skaffold/deploy/resource/status.go 90.00% <50.00%> (-10.00%) ⬇️
pkg/skaffold/deploy/resource/deployment.go 76.84% <57.14%> (-16.50%) ⬇️
pkg/skaffold/deploy/status_check.go 52.17% <77.77%> (+0.29%) ⬆️
pkg/skaffold/server/server.go 48.88% <0.00%> (ø)
pkg/diag/validator/pod.go 1.78% <0.00%> (ø)
pkg/diag/validator/resource.go 0.00% <0.00%> (ø)
pkg/diag/diag.go 70.58% <0.00%> (ø)

@tejal29
Copy link
Contributor Author

tejal29 commented May 27, 2020

@balopat the commit 139bef9 ensures deployments status changed when pod statuses change.

../../out/skaffold dev -d gcr.io/tejal-test
Listing files to watch...
 - leeroy-web
 - leeroy-app
Generating tags...
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:v1.10.0-43-ga91b3f75a
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:v1.10.0-43-ga91b3f75a-dirty
Checking cache...
 - leeroy-web: Found Remotely
 - leeroy-app: Found Remotely
Tags used in deployment:
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:v1.10.0-43-ga91b3f75a@sha256:3aff2363e7901c39241f3b89d21edf56e6f6a77ad5d65396937d39584f4665db
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:v1.10.0-43-ga91b3f75a-dirty@sha256:9335d05e355c1db3a99307587d11b8f54186be7e8a9c39ea7cb9a260c0418f1a
Starting deploy...
WARN[0002] image [leeroy-app] is not used by the deployment 
 - deployment.apps/leeroy-web configured
 - service/leeroy-app configured
 - deployment.apps/leeroy-app configured
Waiting for deployments to stabilize...
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: creating container leeroy-app
    - pod/leeroy-app-765dbf4496-shnfc: creating container leeroy-app
 - deployment/leeroy-web: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-web-7bd8f4cf85-vbnqh: creating container leeroy-web
    - pod/leeroy-web-7bd8f4cf85-vbnqh: creating container leeroy-web
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: creating container leeroy-app
    - pod/leeroy-app-765dbf4496-shnfc: creating container leeroy-app
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app is waiting to start: leeroy-app1 can't be pulled
 - deployment/leeroy-app: waiting for rollout to finish: 1 old replicas are pending termination...
    - /leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
    - pod/leeroy-app-765dbf4496-shnfc: container leeroy-app in error: Back-off pulling image "leeroy-app1"
 - deployment/leeroy-app failed. Error: could not stabilize within 2m0s: context deadline exceeded.
Cleaning up...
 - deployment.apps "leeroy-web" deleted
 - service "leeroy-app" deleted
 - deployment.apps "leeroy-app" deleted
exiting dev mode because first deploy failed: 1/2 deployment(s) failed

Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a try with the cloud-code-samples, and although I think we can tweak some things, it's looking good!

Waiting for deployments to stabilize...
 - deployment/java-guestbook-backend: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/java-guestbook-backend-5df8f495b-tcql8: container init-db-ready in error: 
 - deployment/java-guestbook-frontend: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/java-guestbook-frontend-ffd5d6495-fkmq9: creating container frontend
 - deployment/java-guestbook-mongodb: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/java-guestbook-mongodb-868948f9bb-gbbzp: creating container mongo
 - deployment/java-guestbook-frontend is ready. [2/3 deployment(s) still pending]
 - deployment/java-guestbook-mongodb is ready. [1/3 deployment(s) still pending]
 - deployment/java-guestbook-backend: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/java-guestbook-backend-5df8f495b-tcql8: container init-db-ready terminated with exit code 0
 - deployment/java-guestbook-backend is ready.
Deployments stabilized in 31.796627999s

@@ -105,12 +105,12 @@ func (d *Deployment) CheckStatus(ctx context.Context, runCtx *runcontext.RunCont
}

details := d.cleanupStatus(string(b))
d.UpdateStatus(details, err)

err = parseKubectlRolloutError(err)
if err == errKubectlKilled {
err = fmt.Errorf("received Ctrl-C or deployments could not stabilize within %v: %w", d.deadline, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to be able to separate these conditions.

@tejal29 tejal29 merged commit adf6da6 into GoogleContainerTools:master Jun 5, 2020
@tejal29 tejal29 deleted the hook_pod_check branch April 15, 2021 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants