-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DagProcessorJob integration for standalone dag-processor #30278
Merged
pierrejeambrun
merged 1 commit into
apache:main
from
potiuk:fix-dag-processor-standalone-job
Mar 24, 2023
Merged
Fix DagProcessorJob integration for standalone dag-processor #30278
pierrejeambrun
merged 1 commit into
apache:main
from
potiuk:fix-dag-processor-standalone-job
Mar 24, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The DagProcessorJob integration implemented in apache#28799 was not complete. It missed a few crucial changes: * importing DagProcessorJob in airflow/models/__init__.py - not importing it there caused `airflow jobs check` to fail, when querying DagProcessorJob in the BaseJob query, because the DagProcessorJob was not registered by the time the query was run (so polimorphic ORM model retrieval was not aware of DagProcessorJob model. * airflow jobs check command did not have DagProcessorJob added as valid job type, so it was impossible to monitor for it * also the processor manager did not set heartbeats periodically, so the Job for the DagFileProcessor was considered as not alive pretty quickly even if standalone dag-processor was running. This PR fixes all three problems. Fixes: apache#30251
2 tasks
pierrejeambrun
approved these changes
Mar 24, 2023
pierrejeambrun
pushed a commit
that referenced
this pull request
Mar 24, 2023
The DagProcessorJob integration implemented in #28799 was not complete. It missed a few crucial changes: * importing DagProcessorJob in airflow/models/__init__.py - not importing it there caused `airflow jobs check` to fail, when querying DagProcessorJob in the BaseJob query, because the DagProcessorJob was not registered by the time the query was run (so polimorphic ORM model retrieval was not aware of DagProcessorJob model. * airflow jobs check command did not have DagProcessorJob added as valid job type, so it was impossible to monitor for it * also the processor manager did not set heartbeats periodically, so the Job for the DagFileProcessor was considered as not alive pretty quickly even if standalone dag-processor was running. This PR fixes all three problems. Fixes: #30251 (cherry picked from commit c858509)
28 tasks
2 tasks
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Apr 27, 2023
The standalone file processor as of apache#30278 introduced accidentally an artifficial delay between dag processing by adding heartbeat but missing to set "only_if_necessary" flag to True. If your dag file processing has been fast (faster than the scheduler job_heartbeat_sec) this introduced unnecessary pause between the next dag file processor loop (up until the time passed), it also introduced inflation of the dag_processing_last_duration metrics (it would always show minimum job_heartbeat_sec) Adding "only_if_necessary" flag fixes the problem. Fixes: apache#30593 Fixes: apache#30884
potiuk
added a commit
that referenced
this pull request
Apr 27, 2023
The standalone file processor as of #30278 introduced accidentally an artifficial delay between dag processing by adding heartbeat but missing to set "only_if_necessary" flag to True. If your dag file processing has been fast (faster than the scheduler job_heartbeat_sec) this introduced unnecessary pause between the next dag file processor loop (up until the time passed), it also introduced inflation of the dag_processing_last_duration metrics (it would always show minimum job_heartbeat_sec) Adding "only_if_necessary" flag fixes the problem. Fixes: #30593 Fixes: #30884
ephraimbuddy
pushed a commit
that referenced
this pull request
Apr 27, 2023
The standalone file processor as of #30278 introduced accidentally an artifficial delay between dag processing by adding heartbeat but missing to set "only_if_necessary" flag to True. If your dag file processing has been fast (faster than the scheduler job_heartbeat_sec) this introduced unnecessary pause between the next dag file processor loop (up until the time passed), it also introduced inflation of the dag_processing_last_duration metrics (it would always show minimum job_heartbeat_sec) Adding "only_if_necessary" flag fixes the problem. Fixes: #30593 Fixes: #30884 (cherry picked from commit 00ab45f)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:CLI
area:Scheduler
including HA (high availability) scheduler
type:bug-fix
Changelog: Bug Fixes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The DagProcessorJob integration implemented in #28799 was not complete. It missed a few crucial changes:
importing DagProcessorJob in airflow/models/init.py - not importing it there caused
airflow jobs check
to fail, when querying DagProcessorJob in the BaseJob query, because the DagProcessorJob was not registered by the time the query was run (so polimorphic ORM model retrieval was not aware of DagProcessorJob model.airflow jobs check command did not have DagProcessorJob added as valid job type, so it was impossible to monitor for it
also the processor manager did not set heartbeats periodically, so the Job for the DagFileProcessor was considered as not alive pretty quickly even if standalone dag-processor was running.
This PR fixes all three problems.
Fixes: #30251
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.