Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0044] Add more details to alternatives 🔍 #559

Closed
wants to merge 1 commit into from

Conversation

bobcatfish
Copy link
Contributor

@bobcatfish bobcatfish commented Nov 11, 2021

Make updates to TEP-0044 to get ready for proposing next steps:

  • Add more requirements around hermetic support and status information
  • Organize options into "implementation" and "syntax" options which can
    largely be mix and matched - and I think we need to agree on the
    "implementation" (specifically the semantics of this feature)
    before we start focusing on syntax in this case
  • Update the "collapse task and pipeline into one thing" option to be a
    bit less theoretical ("foobar" as an example name might have obscured
    the option a bit)
  • Add explicit pipeline in a pod and pipeline as a taskrun options
  • At this point I think this TEP clearly overlaps with the TEP that was
    previously called TEP-0046 (pipeline in a pod)

@bobcatfish bobcatfish added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Nov 11, 2021
@tekton-robot tekton-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 11, 2021
* We would be [limited in the features we could support](https://github.com/tektoncd/experimental/tree/main/pipeline-to-taskrun#supported-pipeline-features)
to features that TaskRuns already support, or we'd have to add more Pipeline features to TaskRuns.

##### TaskRun controller composes one pod from multiple TaskRuns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrittoli with this option i tried to capture what you described in your comment in #447 (comment) - let me know if this is what you were getting at (i also might have run too far with it b/c ultimately I think you were saying you didn't like the Pipeline directly to TaskRun option as a long term solution which I agree with you about)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking a bit more about this, and I like we would need an intermediate resource, for a group of TaskRuns. The TaskRunGroup controller could use the TaskRunGroup resource to hold data required to the definition / status of the Group as a whole. The TaskRunGroup would be created by the PipelineRun controller to manage a group of its tasks and it would watch for updates on the Group only.

The TaskRunGroup could even decide to split the tasks it's responsible for into more than one Pod if the single Pod is too large to be scheduled and as long as the group setup allows for it (e.g. we could have policies about not creating PVCs - so stay on one pod or fail in case of shared storage).

image

@skaegi
Copy link
Contributor

skaegi commented Nov 15, 2021

/assign @afrittoli
/assign @jerop
/assign

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am worried we are trying to solves too much at the same time — although it helps seen what problems are there.

I would love if we could explore solution removing workspace from the picture — aka, designing a pipeline with task that do not use workspace(s) at all (for example, they are sharing data by using s3 inside there step to share data), and see what we can do to "Decouple Task Composition from Scheduling", independently of workspaces. And then, put back workspace in and iterate.

Duplication is better than the wrong abstraction 👼🏼

Comment on lines +99 to +163
1. It must be possible for Tasks within a Pipeline that is run in this manner to be able to specify different levels
of [hermeticity](https://github.com/tektoncd/community/blob/main/teps/0025-hermekton.md)
- For example, a Task that fetching dependencies may need to run in non-hermetic mode, but the Task that actually
builds an artifact using these dependencies may need to be hermetic (e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure what is the relation with the TEP, can you provide more details ? (in addition, it seems something is missing after e.g. 😅 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there may be a relation with the "hermekton" TEP - as describe in the example tasks within a pipeline might have different levels of hemerticity requested, so the design in this TEP should allow for that, or alternatively it should exclude hermetic tasks (hopefully it won't be the case). Maybe just drop the e.g. part or add more details?


* Functionality that could be supported with current pod logic (e.g. by
[translating a Pipeline directly to a TaskRun](#pipeline-executed-as-taskrun)):
* Sequential tasks (specified using [`runAfter`](https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#using-the-runafter-parameter))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean parallel tasks are not support ?

Comment on lines +199 to +259
* Functionality that could be supported with updated pod construction logic:
[Sidecars](https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#specifying-sidecars)
(may need to wrap sidecar commands such that sidecars don't start executing until their corresponding Task starts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be also weird case if, multiple task define similar sidecar (same name for example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having multiple tasks in a Pod would allow to implement - at least in some cases - "pipeline level" sidecars.
I've been working recently on enabling Kind base testing in Tekton CI upstream.

With docker running in a sidecar it's possible to setup the k8s cluster in the same pod and connect locally to it, but the pipeline of test execution must be setup a bash scripts (install Tekton, run tests, run yaml tests). With pipeline in a pod it might become possible to run docker + kubernetes in the sidecar, and use tasks (instead of shell functions) for the Tekton installation and execution.

Copy link
Member

@vdemeester vdemeester Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I tend to see value there (in Pipeline sidecar), isn't this achievable today by starting a pod/deployment/service and removing it after (in the finally) ? It's not necessarily even more verbose 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can achieve that today, but you have to manage it in your pipeline while it would be nice to have a more native construct for that. The other advantage of running everything in a pod for DinD is that you can rely on the socket to connect to the docker daemon and don't have to go through http/https on a different Pod port.

* Any dynamically created TaskRuns, e.g. dynamic looping based on Task results
(from [TEP-0090 matrix support](https://github.com/tektoncd/community/pull/532)) since
[a pod's containers cannot be updated](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podspec-v1-core)
* Running each Task with a different `ServiceAccount` - the pod has one ServiceAccount as a whole
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty important one 😓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think the PipelineRun controller should be able to decide what can / should run in a single Pod and what requires different Pods and it should notify users about this via the status.

choices around somewhat independently of deciding
[what syntax we want to create to support this](#syntax-options).

#### More than one Task in one pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but why is it the only alternative at this level, and all others grouped under "other implementation options" ?
This TEP is about "Decouple Task Composition from Scheduling", more than one task in one pod is just one option among other, I don't see it as the main possibility (yet?)

@@ -860,55 +995,71 @@ Pros:
Cons:
* The line between this and a Pipeline seems very thin
* New CRD to contend with
* Will mean that both the Pipelines controller and this controller will need to share (or duplicate) a lot of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a con ? we share a lot of logic with knative, it ends up being in a library (knative.dev/pkg)

disk
* We'd need to define an extension mechanism so folks can use whatever backing store they want
* Doesn't help us with the overhead of multiple pods (each Task would still be a pod)
* If this custom task is widely adopted, could fork our user community (or we'd have to decide to promote it to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this one of the goal of the CustomTask, to experiment, so that we make sure we get the correct abstraction and if we feel it needs to get into the API, we bring it in ?


In this approach we could add more workspace types that support other ways of sharing data between pods, for example
uploading to and downloading from s3. (See [#290](https://github.com/tektoncd/community/pull/290/files).)
This approach is very similar to [Grouping CRD](#grouping-crd) but instead of creating a new type in the Tekton API,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same approach to be honest. I would honestly see the TaskGroup CRD to be started as a CustomTask anyway 😝

Comment on lines 1022 to 1024
**Pursuing this option would be revisiting
[TEP-0074](https://github.com/tektoncd/community/blob/main/teps/0074-deprecate-pipelineresources.md)
and no longer deprecating PipelineResources.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we "commited" to deprecated resources, so on this, we need to decide before release 0.30 (to be able to revert the deprecation notice)


Cons:
* Helps with our conceptual problems but not with our efficiency problems
* The difference between a workspace teardown and finally could be confusing
* Not clear what the idea of a PipelineResource is really giving us if it's just a wrapper for Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+:100:


Most of the solutions above involve allowing more than one Task to be run in the same pod, and those proposals all share
the following pros & cons.
[Implementation options](#implementation-options):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the strategy of separating implementation options from syntax options here!


In this option we make it possible to express Tasks which can be combined together to run sequentially as one pod.
####### init/before finally/after
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while trying to figure out what this example does differently than what already exists (the concept of a pipeline "finally" task); could you add a short description like "Allow pipeline tasks to have "before" tasks and "after" tasks?

The idea of having multiple tasks in "before" or "after" is also sort of confusing, are they run in parallel or sequentially? the former feels like pipelines in pipelines and the latter feels like it should just be a list of tasks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those would be on the Task (or TaskRun) iteslf, no on Pipeline objects (a bit similar to #502 or #369). The idea then would be :

  • They obey the Task law (all in sequence)
  • This is explicit, the user choose what to run in the same pod or not
  • They would allow things such as a single TaskRun to build a image based of some source (by injecting a git-clone task in there)


### Within the Task
##### Within the Task

In this option we ignore the "not at Task authoring time" requirement and we allow for Tasks to contain other Tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a requirement that colocation isn't specified at authoring time? This line is confusing as I don't see something similar in the requirements section and many of the proposed alternatives are at authoring time.

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates!
I added a few comments - I hope you find them useful.

Comment on lines +99 to +163
1. It must be possible for Tasks within a Pipeline that is run in this manner to be able to specify different levels
of [hermeticity](https://github.com/tektoncd/community/blob/main/teps/0025-hermekton.md)
- For example, a Task that fetching dependencies may need to run in non-hermetic mode, but the Task that actually
builds an artifact using these dependencies may need to be hermetic (e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there may be a relation with the "hermekton" TEP - as describe in the example tasks within a pipeline might have different levels of hemerticity requested, so the design in this TEP should allow for that, or alternatively it should exclude hermetic tasks (hopefully it won't be the case). Maybe just drop the e.g. part or add more details?

Comment on lines +103 to +166
1. Information about the status of individually executing Tasks must still be surfaced in a way such that
in the `status` of the PipelineRun, the execution of each individual Task (including its results) can be viewed
separately
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be possible to retrieve status, logs and results of individual tasks.

Right now those are exposed via the PipelineRuns but that's something that we might want to remove in future, as we introduce pipeline in pipelines, loops and other more complex features, it becomes more and more complex to maintain and it fills etcd with duplicate data.

Right now the ownership chain looks like this:

image

I wonder if we could have an intermediate API (a custom task like the experimental one or maybe a TaskGroup), where the group CRDs owns both the shared Pod and the TaskRuns:

image

In that approach each Task individual status and results would be available on the TaskRun. Logs would be shared on the Pod but still separated by container at least, and it would be possible for clients to extract those specific to a task.

Comment on lines +109 to +171
so if the execution method ends up not being individual `TaskRuns` we may need to either create "fake" taskruns
to populate, or create a new section with different status information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed :)

Comment on lines +199 to +259
* Functionality that could be supported with updated pod construction logic:
[Sidecars](https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#specifying-sidecars)
(may need to wrap sidecar commands such that sidecars don't start executing until their corresponding Task starts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having multiple tasks in a Pod would allow to implement - at least in some cases - "pipeline level" sidecars.
I've been working recently on enabling Kind base testing in Tekton CI upstream.

With docker running in a sidecar it's possible to setup the k8s cluster in the same pod and connect locally to it, but the pipeline of test execution must be setup a bash scripts (install Tekton, run tests, run yaml tests). With pipeline in a pod it might become possible to run docker + kubernetes in the sidecar, and use tasks (instead of shell functions) for the Tekton installation and execution.

* Any dynamically created TaskRuns, e.g. dynamic looping based on Task results
(from [TEP-0090 matrix support](https://github.com/tektoncd/community/pull/532)) since
[a pod's containers cannot be updated](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podspec-v1-core)
* Running each Task with a different `ServiceAccount` - the pod has one ServiceAccount as a whole
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think the PipelineRun controller should be able to decide what can / should run in a single Pod and what requires different Pods and it should notify users about this via the status.

* We would be [limited in the features we could support](https://github.com/tektoncd/experimental/tree/main/pipeline-to-taskrun#supported-pipeline-features)
to features that TaskRuns already support, or we'd have to add more Pipeline features to TaskRuns.

##### TaskRun controller composes one pod from multiple TaskRuns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

* We would be [limited in the features we could support](https://github.com/tektoncd/experimental/tree/main/pipeline-to-taskrun#supported-pipeline-features)
to features that TaskRuns already support, or we'd have to add more Pipeline features to TaskRuns.

##### TaskRun controller composes one pod from multiple TaskRuns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking a bit more about this, and I like we would need an intermediate resource, for a group of TaskRuns. The TaskRunGroup controller could use the TaskRunGroup resource to hold data required to the definition / status of the Group as a whole. The TaskRunGroup would be created by the PipelineRun controller to manage a group of its tasks and it would watch for updates on the Group only.

The TaskRunGroup could even decide to split the tasks it's responsible for into more than one Pod if the single Pod is too large to be scheduled and as long as the group setup allows for it (e.g. we could have policies about not creating PVCs - so stay on one pod or fail in case of shared storage).

image

Comment on lines 286 to 293
##### Rely on the affinity assistant

In this approach we'd rely on [the affinity assistant](https://github.com/tektoncd/community/pull/318/files) to
co-locate pods that use the same workspace.

Cons:
* Doesn't help us with the overhead of multiple pods
* [TEP-0046](https://github.com/tektoncd/community/pull/318/files) explores shortcomings of this approach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the "leave things as they are" alternative :)

In this option we make it possible to express Tasks which can be combined together to run sequentially as one pod
within a Pipeline.

####### Using Pipelines in Pipelines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description below, this is a similar approach to running "parts" of a pipeline in Pod, but instead of having the controller deciding it, we ask users to explicitly setup chunks of a pipeline as pipelines to be executed in single-pod mode.

Comment on lines +258 to +320
As the TaskRun controller (or some new controller for this specific purpose) reconciled the TaskRuns that make up the
Pipeline, it would need to:
1. Be able to save state across reconcile loops that would represent the eventual pod and update it as it reconciled.
Since the relevant fields in the pod itself [cannot be updated](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podspec-v1-core)
this would have to be done somewhere else, e.g. in memory in the controller or in `ConfigMap`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a group CRD and group controller, the group controller would analyse the group when its first reconciled, and it would create a Pod with one container for each TaskRun that might be executed as part of the group.
It would then start creating TaskRuns to share the current state towards both users. It would add a link to created TaskRuns in its own status to trigger reconcile on PipelineRun controller and corresponding updates of the PipelineRun. Task results (within the group) could be stored on the TaskRuns as usual, or with the mechanism we come up with in the results TEP.
Any other group orchestration specific data that is needed by the Pod could be stored on the TaskGroup CRD itself - I don't have an example in mind though of what kind of data this could be.

Make updates to TEP-0044 to get ready for proposing next steps:
* Add more requirements around hermetic support and status information
* Organize options into "implementation" and "syntax" options which can
  largely be mix and matched - and I think we need to agree on the
  "implementation" (specifically the semantics of this feature)
  before we start focusing on syntax in this case
* Update the "collapse task and pipeline into one thing" option to be a
  bit less theoretical ("foobar" as an example name might have obscured
  the option a bit)
* Add explicit pipeline in a pod and pipeline as a taskrun options
* At this point I think this TEP clearly overlaps with the TEP that was
  previously called TEP-0046 (pipeline in a pod)
@bobcatfish bobcatfish force-pushed the pipeline_pod_next_steps branch from 157a035 to b58eea9 Compare December 14, 2021 18:47
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign afrittoli
You can assign the PR to them by writing /assign @afrittoli in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 14, 2021
@lbernick
Copy link
Member

@vdemeester @afrittoli @skaegi @jerop I'm taking over this TEP from Christie; I tried to address the feedback that's been left so far in a new PR (#586) but please let me know if I've missed anything!

@dibyom
Copy link
Member

dibyom commented Dec 20, 2021

Closing this since @lbernick is taking this over in #586

@dibyom dibyom closed this Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Proposed
Development

Successfully merging this pull request may close these issues.

8 participants