-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for remote cluster deployments #42
Add support for remote cluster deployments #42
Conversation
Signed-off-by: thepetk <thepetk@gmail.com>
Signed-off-by: thepetk <thepetk@gmail.com>
@thepetk what if someone wants to deploy to multiple namespaces remotely? Do we need to ensure all of these namespaces are configured before hand? |
For now as first iteration of this new feature, lets keep it simple and keep the same name. I am thinking of edge cases where someone can specify different namespaces on diffierent clusters but there might be duplicate resources already present and it may result in failures, I think we should probably think a bit more. Any other concerns? |
Totally agree and pretty much this is the reason of using the same namespace. I feel too that many things might change, even when we finalize the updates on the software template side. |
@maysunfaisal this PR made me thinking also this: How we feel about a separate tekton argoCD application for all cases. I'm thinking out loud here, we could setup a specific namespace on the RHDH cluster where all pipelineRuns run. This could potentially reduce the amount of secrets we spread across different namespaces right (with initialize-namespace job)? I was curious to see wdyt. The main reason I'm thinking this is because right now we initialize the namespace (If I get it right) and the user who has access to the software template namespace only, gets access to organization-wide secrets (e.g quay token). So using a namespace with restricted access to run the pipelineRuns might reduce this spread. |
Yes I think so. At least for the current PR I feel is out of scope. We could potentially provide a way through an installer script to ease this process (e.g automatically prepare the namespace) but in terms of the Software Template level I feel it should be a requirement on the remote side to have everything ready so they can accept deployments of the templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code generally looks good to me, just need to take some time and try it out as well :)
Interesting thought. I think we can open an issue to track the idea about it. My only concern would be to protect the pipeline resource names from clashing with so many other pipelines in the same namespace. I think we can raise it as a parking lot topic |
Yeah +1, just wanted first to see if there's anything missing from my train of thought so we had a big NO-GO. I'll raise it as parking lot item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to get a detailed instruction set for how to set everything up to utilize the remote cluster? I believe it would also make it easier for others looking to give it a try for review?
The truth is the setup here is quite complicated. I have tried to capture everything in the PR description. But ofc I can have a more detailed version tomorrow if that helps! Let me know! |
@thepetk I agree with it being complicated, I think having an even more detailed version with maybe some examples or templated steps would help those reviewing + can be used for actual setup for end users |
It probably makes sense to write up a documentation because we can point that documentation to whoever is eventually going to use it. It should perhaps be part of Acceptance Criteria. |
@Jdubrick I tried my best <3. Went through the previous steps and added more details to each one of them. In regards to the actual setup for end users I partially agree as this flow is a good start to add documentation but in general it should be covered as part of RHDHPAI-622. Detailed instructions for reviewersCluster Setup
- authProvider: serviceAccount
name: <sa name>
serviceAccountToken: <sa token>
skipTLSVerify: true
url: https://api.mycluster.p3.openshiftapps.com/
Now that your RHDH instance is up and running, monitoring the resources of the remote cluster too, you have to register the remote cluster in RHDH argoCD instance. For more information about this step see here. More specifically you'll need to ad a new secret to the RHDH namespace (e.g. kind: Secret
apiVersion: v1
metadata:
name: remote-cluster
namespace: ai-rhdh
labels:
argocd.argoproj.io/secret-type: cluster
data:
config: |
{"bearerToken": "your-token-see-below-how-you-obtain-it", "tlsClientConfig":{"insecure": true}}
name: api-mycluster-p3-openshiftapps-com:443
server: https://api.mycluster.p3.openshiftapps.com:433
type: Opaque Note: to get the token for the
This is just a suggestion from my side. When you'll try to deploy the gitops resources of the software template on the remote cluster the selected namespace won't be initialized. That said you want be able to pull the new image once the gitops repo is updated. A very quick workaround (not optimal but works) is to install RHDH on the remote cluster too and initialize the namespace you are going to use for the remote cluster. For example before you deploy remotely you can simply install a chatbot software template in the same namespace, from the RHDH instance of the remote cluster. Note, again this is a workaround to tackle the remote namespace initialization. A permanent solution is out of the scope of this PR and it will be addressed as part of RHDHPAI-622 & RHDHPAI-581. Software Template Setup
|
I agree from the documentation perspective. As mentioned also in the instructions I just wrote it should be covered as part of RHDHPAI-622 & RHDHPAI-581. I feel this should be ok as the |
@thepetk I followed the instructions and seems like the PLR are running on the host cluster 🤔 I am using your template https://github.com/thepetk/ai-lab-template/blob/temp_changes/templates/codegen2/template.yaml |
Yeah this is the reason I went with two separate application components ( With this setup we could have multiple remote clusters involved in the process and ofc no secret in regards to the PLR functionality is shared accross the clusters. I tried to capture it in the PR description too:
|
okay that makes sense. However I did not see a deployment on the remote cluster, and I did get some bunch of argoCD errors 🤔 |
Could you share the error trace? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a piece I am missing in the setup? I followed the instructions and the template prompts me for the remote URL but after entering it the application is still deployed on the host and my remote SA doesn't have permissions when I select it in the 'Topology' tab
Let's check it together, I think might be better! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm after our meeting to go through the changes and setup for testing. We should make sure to update the actual template UI so that it is clear regarding the deployment namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysunfaisal @Jdubrick huge thanks for the review especially with this complicated setup involved |
What does this PR do?:
The PR adds support for remote cluster deployments. More specifically:
application-dev.yaml
flow, so in case of remote cluster deployment we use a different destination for this application.application-dev
into two parts (app
andapp-tekton
). As a result, in case a remote deployment is selected the tekton resources are monitored by a separate argo app. This way we are able to maintain the PoC webhook functionality.Which issue(s) this PR fixes:
Fixes RHDHPAI-580
PR acceptance criteria:
Testing and documentation do not need to be complete in order for this PR to be approved. We just need to ensure tracking issues are opened and linked to this PR, if they are not in the PR scope due to various constraints.
Tested and Verified
Documentation (READMEs, Product Docs, Blogs, Education Modules, etc.)
How to test changes / Special notes to the reviewer:
Setup the two clusters
ai-rhdh-installer
, so the topology and kubernetes plugins show resources for the other cluster.tekton-plugins.yaml
.Deploy templates
ai-lab-template
fork, update the import-gitops-template script to point to this branch.Examples
RHOAI Deployment

The deployed template overview

The two different clusters in topology

PipelineRuns (In host cluster)

Deployments

Important Notes
One important note I'd like everyone to have in mind during review is about the namespace we run the pipelineRuns on the host cluster. The current PR uses the same namespace name with the remote cluster one. My question is if we would like to add an option to the user to specify this too.