Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon EMR on Amazon EKS #17178

Closed
wants to merge 4 commits into from

Conversation

wanderijames
Copy link

Adding operators for interacting with EMR on EKS as per #17175

  • Add operator for starting job
  • Add operator for canceling a job
  • Add operator for checking job state

closes: #17175
related: #17175

- Add operators for starting job
- Add operator for canceling a job
- Add operator for checking job state
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jul 22, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 22, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@mik-laj
Copy link
Member

mik-laj commented Jul 22, 2021

We currently have one operator that allows us to run Spark job on Kubernetes. It works with both EKS and GCP as well as any other Kubernetes platform. - SparkKubernetesOperator. Why would anyone use this operator instead of the generic operator for Kubernetes?

@wanderijames
Copy link
Author

wanderijames commented Jul 23, 2021

We currently have one operator that allows us to run Spark job on Kubernetes. It works with both EKS and GCP as well as any other Kubernetes platform. - SparkKubernetesOperator. Why would anyone use this operator instead of the generic operator for Kubernetes?

Hey @mik-laj, I am aware of apache spark and livy operator and also EMR operator. However, EMR on EKS works differently because EMR launches virtual cluster in your EKS. The pods (spark master and executors) launched are ephemeral, only existing when a start job is invoked. For more information, kindly visit https://aws.amazon.com/emr/features/eks/

In addition, SparkKubernetesOperator is only suitable if you have Spark cluster has been setup in Kubernetes. In this case, it is not.

Copy link
Contributor

@dacort dacort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments/questions.

Realizing there is #16766 already submitted, the big difference between the two seems to be the addition of a couple different operators and the default behavior in #16766 being that the Operator waits for the job to complete.

I haven't done a thorough review of this, but I think (depending on if they're needed) we should try to merge 1 or more of the additional operators into my PR. I'm just not familiar enough with real-world DAGs to know to which degree the additional operators are necessary. If you have any feedback on that, would much appreciate it!

from airflow.providers.amazon.aws.hooks.emr_containers import EmrContainersHook


class EmrContainersGetJobStateOperator(BaseOperator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the use case is for an operator that just returns the job state?

Apologies if this is a silly question - still getting up to speed on typical Airflow patterns.

Copy link
Author

@wanderijames wanderijames Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you asked me that question, I don't think this has a straight forward use case as an operator.

configuration_overrides=configuration_overrides,
tags=tags,
name=name,
client_token=client_token,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see you defining client_token anywhere. Have you tested this in a live environment to validate that it works?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used in line 89;

response = emr_containers.start_job(**self.start_job_params)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it's not initialized anywhere. The default is None, which means if the client doesn't pass in a client_token, a new job won't get created after the first one. I ran into a similar issue and had to generate a UUID as the default:

client_token or str(uuid4())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dacort for this.

…state.py

Co-authored-by: Damon P. Cortesi <d.lifehacker@gmail.com>
@dacort
Copy link
Contributor

dacort commented Aug 20, 2021

Here's my current thinking on this - I like the addition of the Start and Cancel operators. I think we should try to get #16766 merged and then you can 1. rebase off that commit and 2. add documentation to the emr_eks.rst file on how to use the two additional operators.

Thoughts?

@wanderijames
Copy link
Author

Here's my current thinking on this - I like the addition of the Start and Cancel operators. I think we should try to get #16766 merged and then you can 1. rebase off that commit and 2. add documentation to the emr_eks.rst file on how to use the two additional operators.

Thoughts?

I agree @dacort , let's do that.

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 7, 2021
@github-actions github-actions bot closed this Oct 12, 2021
@eladkal
Copy link
Contributor

eladkal commented Nov 6, 2021

@wanderijames #16766 was merged. Would you like to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operators for orchestrating EMR EKS Jobs
4 participants