Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to opt-out dbtRunner during DAG parsing #1495

Merged
merged 7 commits into from
Jan 29, 2025

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jan 28, 2025

While speaking to a customer about #1484, they mentioned they have the following setup:

  • dbt-databricks installed in the same Python virtualenv as Cosmos/Airflow
  • dbt-bigquery installed in a separate Python virtualenv using Astro Dockerfile

And run DAGs using both with the same image. This means 1.9.0a3 breaks them since they use LoadMode.DBT_LS and only debt-data bricks can be parsed. This means that we have to add support to allow users to opt-in / out of using the dbtRunner during DAG parsing - similar to what was done for task execution, in ExecutionConfig.

@tatiana tatiana changed the title Allow users to opt-out dbtRunner during DAG parsing Allow users to opt-out dbtRunner during DAG parsing Jan 28, 2025
Copy link

netlify bot commented Jan 28, 2025

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit 699db27
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/679a23995e8381000887c716
😎 Deploy Preview https://deploy-preview-1495--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

cloudflare-workers-and-pages bot commented Jan 28, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 699db27
Status: ✅  Deploy successful!
Preview URL: https://bb12c50c.astronomer-cosmos.pages.dev
Branch Preview URL: https://opt-out-dbtrunner-in-dbt-ls.astronomer-cosmos.pages.dev

View logs

@tatiana tatiana added the customer request An Astronomer customer made requested this label Jan 28, 2025
@tatiana tatiana marked this pull request as ready for review January 28, 2025 23:00
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:parse Primarily related to dbt parse command or functionality parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing labels Jan 28, 2025
@tatiana tatiana self-assigned this Jan 28, 2025
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.05%. Comparing base (e11e5ae) to head (699db27).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1495   +/-   ##
=======================================
  Coverage   97.05%   97.05%           
=======================================
  Files          77       77           
  Lines        4483     4484    +1     
=======================================
+ Hits         4351     4352    +1     
  Misses        132      132           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana added this to the Cosmos 1.9.0 milestone Jan 28, 2025
Copy link
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I guess would be nice to highlight in our documentation somewhere that starting Cosmos 1.9, we're optimising to use dbtRunner by default for parsing and also show them how they can change the default with the corresponding param in their DAG. WDYT?

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 29, 2025
@tatiana
Copy link
Collaborator Author

tatiana commented Jan 29, 2025

Thanks for the feedback, @pankajastro & @pankajkoti !

I've added a breaking change notice in the changelog: 14b0a2c

As well as docs in 53429d9

@tatiana tatiana merged commit c4e4abc into main Jan 29, 2025
65 of 66 checks passed
@tatiana tatiana deleted the opt-out-dbtrunner-in-dbt-ls branch January 29, 2025 12:52
@pankajkoti pankajkoti mentioned this pull request Feb 14, 2025
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:parsing Related to parsing DAG/DBT improvement, issues, or fixes customer request An Astronomer customer made requested this dbt:parse Primarily related to dbt parse command or functionality parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants