Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design proposal: customize container log-tailing #1911

Conversation

corneliusweig
Copy link
Contributor

No description provided.

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
@codecov-io
Copy link

codecov-io commented Apr 4, 2019

Codecov Report

Merging #1911 into master will increase coverage by 6.36%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1911      +/-   ##
==========================================
+ Coverage   49.75%   56.12%   +6.36%     
==========================================
  Files         175      180       +5     
  Lines        8084     7753     -331     
==========================================
+ Hits         4022     4351     +329     
+ Misses       3684     2986     -698     
- Partials      378      416      +38
Impacted Files Coverage Δ
pkg/skaffold/build/local/docker.go 53.19% <0%> (-46.81%) ⬇️
cmd/skaffold/app/cmd/deploy.go 45.83% <0%> (-23.4%) ⬇️
pkg/skaffold/jib/jib.go 76.34% <0%> (-7.53%) ⬇️
cmd/skaffold/app/cmd/fix.go 65.62% <0%> (-4.38%) ⬇️
pkg/skaffold/color/formatter.go 80% <0%> (-3.34%) ⬇️
pkg/skaffold/schema/profiles.go 87.68% <0%> (-2.55%) ⬇️
pkg/skaffold/build/local/local.go 37.31% <0%> (-2.37%) ⬇️
pkg/skaffold/util/util.go 65.69% <0%> (-2.29%) ⬇️
pkg/skaffold/test/test.go 14.7% <0%> (-1.52%) ⬇️
pkg/skaffold/docker/parse.go 78.5% <0%> (-1.21%) ⬇️
... and 97 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0470bad...a9015ec. Read the comment docs.

@corneliusweig corneliusweig changed the title Add log selector design proposal Design proposal: customize container log-tailing Apr 9, 2019
Copy link
Contributor

@priyawadhwa priyawadhwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @corneliusweig, thanks for submitting this proposal and for your patience in getting it reviewed (I was out last week).

Just had one question for you!

```yaml
log:
# a list of deployment.spec.selector items, rules are OR'ed
- matchExpressions: # in one item, rules are AND'ed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify -- I would either specify matchExpressions or matchLabels? and will matchExpressions look for matches with labels on pods, or something else?

Copy link
Contributor Author

@corneliusweig corneliusweig Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyawadhwa Yes, matchExpressions and matchLabels should be marked with yamltags:oneOf=selector or so. I'll add that.
Both matchExpression and matchLabels should have the same semantics as for deployments, so they both refer to labels -- only matchExpression will allow more logical operations than just matchLabels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, after reading kubectl explain deployment.spec.selector carefully again, I don't think we should place the oneOf constraint here. matchLabels is simply a shorthand for matchExpressions and can be trivially AND'ed with a matchExpressions object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. The nice thing about matchLabels is that it's simpler, but matchExpressions is essentially a superset of matchLabels. I'm wondering if it would be cleaner to only have matchExpressions.

We could have something like

- key: tail
  values: ['true']

and then default the operator to In so that it would be a bit simpler like matchLabels. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite convinced.
If Skaffold wants to look like k8s selectors, we should strive to be as compliant as possible. Using similar names with deviating semantics (default operator) creates confusion. The great advantage of using the standard syntax and semantics for the selector is the recognizability with existing k8s syntax.
Bottom line: if you really think we should deviate from the standard k8s config, then the new config should also look different. But why should it? I think k8s has found a sweet spot of configurability (matchExpressions) and simplicity (matchLabels).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I don't feel super strongly about it, so let's keep it the way you have it!

## Integration test plan

- Add a test case with two deployments, where one is selected based on the log spec and the other is excluded.
Make sure that the excluded pod does not show up in the logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@corneliusweig
Copy link
Contributor Author

Hey @corneliusweig, thanks for submitting this proposal and for your patience in getting it reviewed (I was out last week).

@priyawadhwa No problem at all, I saw your great performance at Next'19. Well done!

At you convenience, please take another look.

Copy link
Contributor

@priyawadhwa priyawadhwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left a question for you.

@balopat
Copy link
Contributor

balopat commented Apr 19, 2019

This looks promising - I have one clarification question: we are talking about pod selectors here - but in case of istio it is not an extra pod that we need to filter out but the sidecar container. Which raises the question - should we introduce both pod and container selection logic?

@corneliusweig
Copy link
Contributor Author

corneliusweig commented Apr 20, 2019

@balopat Yeah, so true.. this won't (yet) solve the istio problem.
I was thinking how the configuration including container selection could look like.

First of all, I think there is a fundamental difference between label selectors and selecting individual containers: labels are usually generic so that white-listing labels makes sense. Containers on the other hand have no generic name/labels, so that white-listing does not make sense. Therefore, I think that the container selector should follow a black-listing logic.

Thus, my favored option would be:

log:
  - matchExpressions:
      - key: app
        values: ['traefik']
        operator: In
    matchLabels: # note: matchLabels and matchExpressions in one selector are AND'ed
      tail: 'true'
    excludeContainers:
    - 'istio-proxy'
    - 'istio*'   # allow glob patterns for the name
  - ...

Here, excludeContainers lists container names that should be excluded from log-tailing. Glob patterns for container names should be possible.

In addition, I could imagine that excludeContainers could default to some well-known infrastructure container names that a user is most likely not interested in. For example, istio-proxy.

WDYT?

Resolution: A UUID label is not required for now but can be added later when users request it.

## Implementation plan
1. Switch the selection of pods for the log selector from image lists to `tail=true` label (#1910).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue with helm: labels are applied to resources after they are created. not before. This means that we won't see the logs during a short period of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgageot

There's an issue with helm : labels are applied to resources after they are created. not before.

You mean this is a restriction of helm itself or how skaffold handles helm deployments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because helm templates are resolved cluster side, skaffold can't apply the labels on the yaml before sending to k8s, the way it does with kubectl and kustomize.

Copy link
Contributor Author

@corneliusweig corneliusweig Apr 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a minor inconvenience. Once the label is applied by helm, the logs will be tailed starting from the global startTime (pkg/skaffold/kubernetes/log.go:50). I just tried this with the following patch based on PR #1910

diff --git a/examples/helm-deployment/Dockerfile b/examples/helm-deployment/Dockerfile
index c1c9be3e..2307eff6 100644
--- a/examples/helm-deployment/Dockerfile
+++ b/examples/helm-deployment/Dockerfile
@@ -1,3 +1,7 @@
-FROM nginx:stable
+FROM golang:1.10.1-alpine3.7 as builder
+COPY main.go .
+RUN go build -o /app main.go
 
-COPY static /usr/share/nginx/html/
+FROM alpine:3.7
+CMD ["./app"]
+COPY --from=builder /app .
diff --git a/examples/helm-deployment/main.go b/examples/helm-deployment/main.go
new file mode 100644
index 00000000..46d5cbd9
--- /dev/null
+++ b/examples/helm-deployment/main.go
@@ -0,0 +1,14 @@
+package main
+
+import (
+	"fmt"
+	"time"
+)
+
+func main() {
+	for i := 0; ; i++ {
+		fmt.Printf("Hello world! %d\n", i)
+
+		time.Sleep(time.Second * 1)
+	}
+}
diff --git a/examples/helm-deployment/skaffold-helm/templates/deployment.yaml b/examples/helm-deployment/skaffold-helm/templates/deployment.yaml
index ad989438..f1b4442b 100644
--- a/examples/helm-deployment/skaffold-helm/templates/deployment.yaml
+++ b/examples/helm-deployment/skaffold-helm/templates/deployment.yaml
@@ -14,6 +14,7 @@ spec:
       labels:
         app: {{ template "skaffold-helm.name" . }}
         release: {{ .Release.Name }}
+        tail: "true"
     spec:
       containers:
         - name: {{ .Chart.Name }}

All logs starting from iteration 0 are shown when using the label selector approach.
Therefore, there will be a small delay until the logs are shown, but nothing will be lost.

What bothers me though, is that helm deployments don't seem to apply the tail=true label to deployed pods. I had to manually apply that in the above patch. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a real problem, see #2074 for further information. I also opened PR #2075 which would fix this, so that the label selection with helm deployments works again.

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
@priyawadhwa
Copy link
Contributor

@corneliusweig that makes sense to me; in this case we include a container's logs if:

  1. it matches either an expression or a label
  2. it isn't included in excludeContainers

I'm good with that, @balopat WDYT?

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
@corneliusweig
Copy link
Contributor Author

corneliusweig commented May 14, 2019

@balopat @priyawadhwa Have you had time to discuss the log tailing config? Do you think it's worth to be implemented?

@tejal29
Copy link
Contributor

tejal29 commented May 16, 2019

@corneliusweig after reading #1911 (comment), i am thinking do we need a pod selecting logic?
You listed 2 use cases:

  1. It does not allow to add additional pods to log aggregator which are not deployed by Skaffold (Combined logger pod selector. #666).
  2. It does not allow to exclude uninteresting pods deployed by Skaffold. For example, when working with istio, all the irrelevant istio logs spam the log (Ignore some containers in logs #588, Allow logs to be excluded from skaffold dev output (by regexp or by pod) #1991)

For the 2nd use case,
If i am k8 developer, i am writing code which produce an image.
They get run on one or many pods which may belong to services, deployments, cron jobs etc.
I would only want log messages from the containers which we already do provide a way to specify via ImageList.

Maybe, we could fix the way we tail logs,
In log.go we go through all pod containers. We check if they are steady.
If it is then its added to tracked Containers and we stream logs. Maybe we should only track the container if the image is in the ImageList?

	for _, container := range pod.Status.ContainerStatuses {
               // Add filter based on `container.Image` field.
		if container.ContainerID == "" {
			if container.State.Waiting != nil && container.State.Waiting.Message != "" {
				color.Red.Fprintln(a.output, container.State.Waiting.Message)
			}
			continue
		}

		if !a.trackedContainers.add(container.ContainerID) {
			go a.streamContainerLogs(cancelCtx, pod, container)
		}
	}

Now regarding the first use case, which is "user defined additional pod logs",

  1. for deploy the way to deploy prebuilt images is via --images flag.
  2. for port-forwarding to forward additional pods, we are thinking of doing the same (/cc @priyawadhwa )

Maybe, we should do the same for logging, where users can define additional images they want logs from? This could be a command line flag.

What do you think?

@nkubala
Copy link
Contributor

nkubala commented May 16, 2019

with the helm labeling improvements and first parts of the implementation for this underway, I think we're about ready to merge this proposal. to me the logic for excluding containers from logs if they match the provided regex but are specified in the excludeContainers stanza makes sense.

@balopat @dgageot @priyawadhwa anyone have any final objections?

@priyawadhwa
Copy link
Contributor

priyawadhwa commented May 16, 2019

@nkubala sgtm

@tejal29 I like that idea, but the design is slightly different to what is proposed here. For container exclusion, we would check if the image was in excludeContainers?

@tejal29
Copy link
Contributor

tejal29 commented May 16, 2019

@priyawadhwa, @corneliusweig Sorry if it wasn't clear.
My comment was polite nudge towards re-evaluating if we need this extra configuration.
I feel we might not need this if we fix the way we stream pod logs right now.

@priyawadhwa
Copy link
Contributor

@tejal29 @corneliusweig I do like the idea of minimal configuration. I think @tejal29's idea would work, except we would still need to add some additional support for excluding containers that skaffold thinks it should log.

@corneliusweig WDYT?

@corneliusweig
Copy link
Contributor Author

It's too late here. I'll follow up tomorrow.

@corneliusweig
Copy link
Contributor Author

@tejal29 Alright, sorry for the delay.
So let me first restate your suggestion so that we know if I understood you correctly:
You propose to keep the existing ImageList mechanism, but only select those containers from relevant pods whose image is built by Skaffold. In addition there should be command-line flags to include additional pods in logging which are not managed by Skaffold.

Overall, I find this an elegant extension of the existing functionality. It covers most of the use-cases which were suggested for the log-tailing customization. The one which it does not fully cover is when Skaffold builds and manages a pod, but the user is in general not interested in the logs thereof (#1991).

However, I have other more fundamental problems with the ImageList approach which in my opinion call for some refactoring:

  • For one, the ImageList is quite complex to maintain. The concern to keep the imageList up to date is quite scattered over the code. Consequently, the PR for the label selector approach reduces the line count with 127 insertions(+) and 99 deletions(-) (not including tests). So overall I think the label selector will be easier to maintain and is also more straightforward to understand.
  • Further, the recommended way to select some resource in k8s is via labels. I think the reason why Skaffold is currently using ImageList is simply that the labelling was not properly working when the log-tailing was introduced. Otherwise it would have used labels right from the start (correct me if I'm wrong).
  • Finally, selecting pods based on labels also integrates well with other k8s tools, such as kubectl logs -l skaffold.dev/tail=true. If those results should be consistent, Skaffold should also select based on labels and not on a custom solution.

In addition, there are advantages of making the log configuration explicit:

  • element of least surprise: I could have known better, but I was a bit surprised to see that pods which are deployed by Skaffold are not log-tailed if their image is not built by Skaffold. A quick search of skaffold.dev also revealed that this is not documented.
  • putting the log configuration in the skaffold.yaml also allows to share a logging configuration. This is not possible with CLI flags. Besides, I think Skaffold already has a lot of flags and adding more makes the tool more difficult to use
  • a config section also makes the functionality more discoverable

Overall, I think label based pod selection has a lot of advantages.


I'm still thinking about how to incorporate your idea to only tail from those containers in a pod whose image is built by Skaffold. In spririt this is a white-list approach based on the container image. However, I don't think it can work with the proposed logging config, because when I add an additional pod into the set of pods for log-tailing, I would additionally have to include its image in this whitelist. So to tail logs from an additional pod I would have to make two independent adjustments, which is an awful user experience.

@tejal29
Copy link
Contributor

tejal29 commented May 20, 2019

Thanks @corneliusweig for the detailed explanation.

My proposal was to stay away from "pods" and control this config at container level.

  1. When skaffold dev is invoked, skaffold tails logs from all containers which it it built.
  2. If users want to tail logs from additional containers like a redis container, or mysql, specify it on command line.

The reasons i want to stay away from pods Vs containers

  1. Within a pod, not all containers are interesting e.g. "istio", or any other side car.
  2. We build containers, and sidecars like "istio" etc is cluster specific. They might not exist on dev cluster.
  3. Users will need to keep skaffold logging config in sync with the k8 manifests labels ( not a big deal)

Regarding you points

In addition, there are advantages of making the log configuration explicit:

  • element of least surprise: I could have known better, but I was a bit surprised to see that pods which are deployed by Skaffold are not log-tailed if their image is not built by Skaffold. A quick search of skaffold.dev also revealed that this is not documented.

I am ok with this since we are logging containers which users developed.

  • putting the log configuration in the skaffold.yaml also allows to share a logging configuration. This is not possible with CLI flags. Besides, I think Skaffold already has a lot of flags and adding more makes the tool more difficult to use

I like the sharing aspect. I dont mind adding a logging config instead of command line. Could we only add additional containers to log in the skaffold config instead of the label selector logic?

  • a config section also makes the functionality more discoverable

Agree. Same comment as above, my concern should it be on pod level or container image level?

Regarding your observations:

However, I have other more fundamental problems with the ImageList approach which in my opinion call for some refactoring:

For one, the ImageList is quite complex to maintain. The concern to keep the imageList up to date is quite scattered over the code. Consequently, the PR for the label selector approach reduces the line count with 127 insertions(+) and 99 deletions(-) (not including tests). So overall I think the label selector will be easier to maintain and is also more straightforward to understand.

The ImageList is right now just a list of artifacts skaffold build along with any additional images. We already maintain it for skaffold deploy and we will have to pass it around.

Further, the recommended way to select some resource in k8s is via labels. I think the reason why Skaffold is currently using ImageList is simply that the labelling was not properly working when the log-tailing was introduced. Otherwise it would have used labels right from the start (correct me if I'm wrong).

I am not aware of the history here either. :( Maybe someone else can enlighted us here.

Finally, selecting pods based on labels also integrates well with other k8s tools, such as kubectl logs -l skaffold.dev/tail=true. If those results should be consistent, Skaffold should also select based on labels and not on a custom solution.

This is true. I am just worried, we are adding a bunch of config for that to happen. With Sakffold we are making "container development" experience better for k8 application developers. Containers are a unit of building, deploying and debugging for us.
While, kubectl focus is to help users interact with Kubernetes clusters easily and Pod is unit k8.
I might be totally wrong here :) But i feel our focus should be on "Containers" vs "Pods".
I also feel with this, we are digging in to "What should be Skaffold Philosophy?"
Shd skaffold be the tool where you can do everything building, deploying, interacting with k8 cluster like kubectl does? Where should we draw the line?

@corneliusweig
Copy link
Contributor Author

Hey @tejal29, sorry for my delayed response, I was quite busy these last days.

You are raising some important concerns which should be clarified before discussing the implementation details.

Should skaffold focus on containers or pods?

You are right that Skaffold is a tool to ease container development. On the other hand, containers are deployed as pods. So as soon as it comes to deployment, Skaffold should IMHO not ignore or try to hide the fact that the container is now part of a k8s pod. After all, Skaffold also helps to run containers (as pods) and at that point it's the pod that needs to function and not just the container inside the pod. Skaffold also ensures quick feedback cycles when changing a configuration manifest, which clearly belongs to the realm of pods and not just container. My feeling is that trying to hide away the pod and just focus on the container inside a pod will cause trouble. For example, what about init-containers. It is very common to use pre-built containers to check for database connectivity. If the focus is on built containers only, this will mean that logs from init containers should never be shown, but some users already request to change that (also see #1865).

In the end, I think this is something you need to discuss internally because this is a decision of where you are headed with Skaffold. @tejal29 can you bring this up in one of your meetings? I'll most likely not be able to join on Wednesday for the bi-weekly.

@balopat
Copy link
Contributor

balopat commented Sep 18, 2019

Thoughts/ideas from today's Office Hours:

  • we need to list the use cases / user problems explicitly that we want to solve
  • going with a generic model like k8s resource matching might be too much for a user + it is different from the user defined port forwarding spec
  • maybe we could flip the config: additional-resources or monitor-resources and then under the pod coordinates we could have both port-forwarding (specificied by port) and log tailing spec (specified by container name)
  • exclusion of containers could be by default just tailing logs from only skaffold built containers - that might just solve all the "sidecar" log issues?

@corneliusweig
Copy link
Contributor Author

To be realistic, I'll not find time to work on this anytime soon. Also, the design during the discussion in the office hour went in a very different direction, so that a revised version of this design proposal should be discussed on a different PR. Therefore it's overdue to being closed now.

But if you miss that capability, give a +1 on the original PR description so that we see if this should be prioritized.

@nkubala
Copy link
Contributor

nkubala commented Jan 8, 2020

@corneliusweig just wanted to say thanks for all your work here, this drove some really great conversations that made us think hard about the way we've designed the log tailing in skaffold. you rock!

@corneliusweig
Copy link
Contributor Author

@nkubala ❤️ I think you are giving more credit than I deserve. I'm still a bit sad that I didn't have the time to push it through. But it sounds like you are picking up some ideas from here, so the work was not in vain 🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants