Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arbitrary json in conn uri format #15100

Merged
merged 9 commits into from
Apr 14, 2021

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Mar 30, 2021

Overview

PRs #15013 and #13934 raise the issue that airflow's conn uri format cannot handle arbitrary json values.

This leaves us with connections that we can define in the web UI that are impossible to serialize to URI format (see #13934).

And with some secrets backends using json serialization instead of URI, this leaves us with an inconsistency -- some backends can produce connections where extra_dejson is a nested dict; others cannot.

This PR adds support for nested json within the URI format without breaking backward compatibility.

Details

URI generation

When generating the URI, the method get_uri will check whether the extra can be simply urlencoded without loss or error.

If so, it will use traditional urlencoding.

When loss or error would occur, get_uri will serialize the full dict to json and store under query param __extra__.

URI parsing

First airflow will check for the existence of special query param __extra__. If it exists, airflow will parse it.

If this param is not present, the traditional approach is used.

More background

Here's example given in #15013:

>>> Connection(conn_id='id', conn_type='http', extra='foobar').get_uri()
[2021-03-25 13:31:07,535] {connection.py:337} ERROR - Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/Users/da.lum/code/python/airflow/airflow/models/connection.py", line 335, in extra_dejson
    obj = json.loads(self.extra)
  File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/nix/store/8kzdflq0v06fq0mh9m2fd73gnyqp57xr-python3-3.7.3/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[2021-03-25 13:31:07,535] {connection.py:338} ERROR - Failed parsing the json for conn_id id
'http://'

As of now, this PR adds support for this case. Though there is debate whether such values should be allowed in the extra field, or whether extra should always be key-value.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj
Copy link
Member

mik-laj commented Mar 30, 2021

I like this solution with the use of a special key. I don't understand the role of base64 here. Is it necessary? It seems to me that the pure value will be easier to use, e.g. you will be able to check the connection configuration by looking at the environment variables.

As for supporting other types, I'm not sure we want to support it. We expect the extra field to be a Web UI dictionary. The use of primitive types also causes a problem with extending a given connection ie. it is not future-proff.

@dstandish
Copy link
Contributor Author

dstandish commented Mar 31, 2021

I like this solution with the use of a special key. I don't understand the role of base64 here. Is it necessary? It seems to me that the pure value will be easier to use, e.g. you will be able to check the connection configuration by looking at the environment variables.

Yep you are right. base64 isn't necessary. i have removed it. but still the uri remains equally indecipherable to the human eye.

As for supporting other types, I'm not sure we want to support it. We expect the extra field to be a Web UI dictionary. The use of primitive types also causes a problem with extending a given connection ie. it is not future-proff.

I agree with this. I like the idea of supporting nested dictionaries, but lists and primatives are not useful and probably a bad idea. You might be interested in the discussion on #15013; the inability to serialize a primitive extra is cited as a primary motivator for the PR. In the present PR I included support for primative / list because they make a valid point that it's not technically forbidden by Connection, and adding support in get_uri at least allows us to properly serialize allowed values. But yeah I'd be happy to forbid primitive and list.

What do you think we should do?

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@mik-laj
Copy link
Member

mik-laj commented Mar 31, 2021

but still the uri remains equally indecipherable to the human eye.

Sometimes it is not necessary to fully understand, but being able to check if a given connection contains a given word, e.g. username / server address, is sufficient.

@dstandish
Copy link
Contributor Author

dstandish commented Apr 1, 2021

ok so to be clear @mik-laj you are in support of supporting arbitrary dict under extra, just not primative or list?

OR, you do not support this PR at all and you think extra should just be dict of strings i.e 'key1':'val1' and so on, e.g. not 'key1':{'key3':'key4'}

if you support this i'll remove the tests for primitive and list, so that the codebase does not explicitly endorse them.

@dstandish
Copy link
Contributor Author

@mik-laj fyi had to update one of your test cases in local filesystem

you were testing connection equality by using get_uri on a extra value that can now be properly serialized with get_uri

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@ashb
Copy link
Member

ashb commented Apr 1, 2021

#13934 would be solved by allowing nested dicts.

dstandish and others added 2 commits April 1, 2021 09:41
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@github-actions
Copy link

github-actions bot commented Apr 1, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

@dstandish
Copy link
Contributor Author

@mik-laj do my replies make sense to you?

@mik-laj
Copy link
Member

mik-laj commented Apr 5, 2021

@mik-laj do my replies make sense to you?

SGTM.

@dstandish
Copy link
Contributor Author

SGTM

OK nice

Would you say we should merge as is? Do you think this needs to go to mailing list?

@mik-laj mik-laj self-requested a review April 5, 2021 22:54
@mik-laj mik-laj requested a review from ashb April 5, 2021 22:54
@github-actions
Copy link

github-actions bot commented Apr 5, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 5, 2021
@mik-laj
Copy link
Member

mik-laj commented Apr 5, 2021

Would you say we should merge as is? Do you think this needs to go to mailing list?

I don't think this needs a mailing list discussion, but I called a second reviewer - @ashb because he commented on this PR.

@dstandish
Copy link
Contributor Author

or perhaps @turbaszek since he was engaged in #13934?

@dstandish
Copy link
Contributor Author

ok i guess i'll merge it @mik-laj

thanks for the help on this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants