Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Spark Connect to pyspark decorator #35665

Merged
merged 5 commits into from
Nov 16, 2023

Conversation

bolkedebruin
Copy link
Contributor

In Apache Spark 3.4 Spark Connect was introduced which allows remote connectivity to remote Spark CLuster using the DataFrame API.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

In Apache Spark 3.4 Spark Connect was introduced which
allows remote connectivity to remote Spark CLuster using
the DataFrame API.
Copy link
Contributor

@eladkal eladkal Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we prefer to keep service classes in single file. For example: #20139

That makes developer life easier with discovering the avaliable classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O. 🤪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the pattern, but I do think it is outside of the scope of this PR as it doesn't belong 'naturally' with one of the others so it would require refactoring all of them.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@bolkedebruin bolkedebruin merged commit 9103ea1 into apache:main Nov 16, 2023
@bolkedebruin bolkedebruin deleted the spark_connect branch November 16, 2023 08:51
@Taragolis
Copy link
Contributor

Oh... Spark connection page now would contain 4 different connections 😢

For the reference: #28790

@bolkedebruin
Copy link
Contributor Author

I think the connection scheme would need rework to prevent this. Like a flavor of a particular connection or a sub type. Jdbc / Connect / Submit are significantly different from each other.

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation provider:apache-spark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants