Dag with high number of branches is slower than dags triggered in a cmd loop #33665
-
Hello, I have the same tasks in two different dags. One dag with 200 branches (DAG_BRANCH_200) and the second one having only the tasks (SIMPLE_DAG) but executed 200 times from the command line. DAG_BRANCH_200: SIMPLE_DAG: NB: parallelism = 100 max_active_runs_per_dag=100 I'm attaching two screenshots of my postgresql dashboard when the both dags are running: Here we can notice that for the DAG_BRANCH_200 we have lot more sessions but only a few active and less transactions per second compared to SIMPLE_DAG? As a result the execution with cmd line finishes faster and without errors on the opposite of the execution of the dag with branches having x3 the time of execution and getting stuck with the error I can't find an explanation for this behavior.. Thank you in advance for your reply. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
You are probably not using PBBouncer. See the note in the docs. https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-postgresql-database Airflow is known to open big number of DB connections and Postgres is known to open a separate process for each connection. PGBouncer is known to provide a fantastic way to join the two - seemingly conflicting requirements. Especially when you attempt to do any kind of artifficial benchmarking like that. I think it would be good to repeat your tests with well configured PGBouncer. BTW. I Am note sure what you are referring to as "with branches" - the dags of your shows many task groups, but I am not sure what "branches" means in this context, this naming seems to be somewhat conflicting with "branching" concept use in Airflow DAGs, so I think you should elaborate on it a bit, if your tests with PGBouncer will show some strange results you cannot explain. |
Beta Was this translation helpful? Give feedback.
You are probably not using PBBouncer. See the note in the docs. https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-postgresql-database
Airflow is known to open big number of DB connections and Postgres is known to open a separate process for each connection. PGBouncer is known to provide a fantastic way to join the two - seemingly conflicting requirements. Especially when you attempt to do any kind of artifficial benchmarking like that.
I think it would be good to repeat your tests with well configured PGBouncer.
BTW. I Am note sure what you are referring to as "with branches" - the dags of your shows many task groups, but I am not sure what "bra…