Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix custom "+tag:some_tag" selector issue related to tests tag inheritance #1466

Merged
merged 8 commits into from
Jan 21, 2025

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jan 14, 2025

The selector method _should_include_node changes test tasks to inherit tags from their parent nodes. While this behaviour is acceptable and desirable in some cases, it can cause problems using graph selectors with tags.

This PR improves the test coverage, narrows down the problem and fixes the problem reported by Astronomer customers. More details below.

A user reported that they see the correct DbtDag when using Cosmos 1.8.1 with:

  • LoadMode.DBT_LS
  • RenderConfig(selector="accounts_marts")

Where the selector accounts_marts is defined as:

    - name: accounts_marts
      description: Run Accounts models
      definition:
        intersection:
          - '+tag:accounts'
          - '+tag:datamart'
          - '+tag:stratus'

The expected behaviour includes:

  • 164 Airflow tasks
  • 152 Local run tasks
  • 12 Snapshot tasks

However, when they attempt to run the same DbtDag using:

  • LoadMode.DBT_MANIFEST
  • RenderConfig(select=["+tag:accounts,+tag:datamart,+tag:stratus"])

Their DbtDag seems to have the wrong subset of nodes.

They reported:

  • 197 Airflow tasks
  • 183 Local run tasks
  • 14 Snapshot tasks

This pull request aims to reproduce and fix this issue.

Copy link

netlify bot commented Jan 14, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit c8cbfe9
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/678f7d1a80b01b00083e8534

Copy link

cloudflare-workers-and-pages bot commented Jan 14, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: c8cbfe9
Status: ✅  Deploy successful!
Preview URL: https://cf3b5dcd.astronomer-cosmos.pages.dev
Branch Preview URL: https://select-intersection-graphs.astronomer-cosmos.pages.dev

View logs

@tatiana tatiana changed the title WIP: Add support for intersection of "+tag:some_tag" in RenderConfig(select) WIP: Investigate issue in the intersection of "+tag:some_tag" in RenderConfig(select) Jan 14, 2025
@tatiana tatiana changed the title WIP: Investigate issue in the intersection of "+tag:some_tag" in RenderConfig(select) WIP: Investigate issue in the intersection of "+tag:some_tag" select statements Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.57%. Comparing base (387558f) to head (c8cbfe9).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1466      +/-   ##
==========================================
- Coverage   96.95%   96.57%   -0.39%     
==========================================
  Files          73       73              
  Lines        4371     4374       +3     
==========================================
- Hits         4238     4224      -14     
- Misses        133      150      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana force-pushed the select-intersection-graphs branch from a4371f6 to dec5687 Compare January 20, 2025 13:01
@tatiana tatiana changed the title WIP: Investigate issue in the intersection of "+tag:some_tag" select statements Investigate issue in the intersection of "+tag:some_tag" select statements Jan 20, 2025
@tatiana tatiana marked this pull request as ready for review January 20, 2025 15:46
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:selector Related to selector, like DAG selector, DBT selector, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing labels Jan 20, 2025
@tatiana tatiana changed the title Investigate issue in the intersection of "+tag:some_tag" select statements Fix custom "+tag:some_tag" selector issue, related to test tag inheritance Jan 20, 2025
@tatiana tatiana changed the title Fix custom "+tag:some_tag" selector issue, related to test tag inheritance Fix custom "+tag:some_tag" selector issue related to tests tag inheritance Jan 20, 2025
@tatiana tatiana added this to the Cosmos 1.9.0 milestone Jan 20, 2025
Copy link
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Fantastic breakdown and analysis in identifying and addressing this fix 👏🏽

@tatiana tatiana merged commit 74c7c7d into main Jan 21, 2025
66 checks passed
@tatiana tatiana deleted the select-intersection-graphs branch January 21, 2025 11:02
@pankajkoti pankajkoti mentioned this pull request Feb 14, 2025
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
pankajkoti added a commit that referenced this pull request Feb 20, 2025
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
#1507
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
#1502
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
#1536
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:selector Related to selector, like DAG selector, DBT selector, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants