Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonhandler: allow reading yaml data from disk #254

Merged
merged 7 commits into from
Jul 16, 2018

Conversation

trevormccasland
Copy link
Contributor

This commit aims to change the jsonhandler to be able to read
data from disk if it is a yaml file.

Note:

  • Simply replacing the loads call with yaml.safe_load is not enough
    due to the nature of the NaN checker requiring an unsafe load[1].

closes #253
[1] 98adca6

This commit aims to change the jsonhandler to be able to read
data from disk if it is a yaml file.

Note:
  * Simply replacing the loads call with yaml.safe_load is not enough
    due to the nature of the NaN checker requiring an unsafe load[1].

closes cdent#253
[1] cdent@98adca6
Copy link
Owner

@cdent cdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how small this change is, I'm of two minds on how to proceed. Initially, I was thinking that it would make more sense as custom ContentHandler that replaced the existing JSONHandler: a subclass could override action. However, action ought to be rafactored to make that a bit easier with a method extracted for the data load.

I'm not sure. As usual, input appreciated from @FND and @jasonamyers. And you too @trevormccasland, of course.

@@ -76,11 +76,17 @@ or::
$.pets: <@pets.json
$.pets[0]: <@cat.json

to use YAML files like the JSON ones above, they must be placed in a
subdirectory::
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may as well explain why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@@ -95,7 +96,10 @@ def action(self, test, path, value=None):
rhs_path = test.replace_template('$' + rhs_path)
info = test.load_data_file(value.replace('<@', '', 1))
info = six.text_type(info, 'UTF-8')
value = self.loads(info)
if value.endswith('yaml'):
value = yaml.safe_load(info)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say in the commit message that you can't use solely yaml.self_load() because of the NanChecker, but it is not clear how that is related to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good call, Looking at this again, I believe this has to do with a bug on how we test for isnan in tests/util.py we should cast the local var 'other' as a float because NaN is an IEEE 754 special value
(Pdb) other
'NaN'
(Pdb) math.isnan(other)
*** TypeError: a float is required

When making this change the test_unsafe_load tests were failing
due to the NaNChecker not casting it's input argument as a float
  * (Pdb) other
    'NaN'
    (Pdb) math.isnan(other)
    *** TypeError: a float is required
@trevormccasland
Copy link
Contributor Author

Next patch will have the new ContentHandler subclassing JSONHandler and overriding the loads method and hopefully a better fix for the NaN issue. Do we want to fail if the input is a string? I believe the gnocchi tests are telling me yes, so maybe its not an issue afterall.

@trevormccasland
Copy link
Contributor Author

So I think I see why the test case is failing now, it appears a float('NaN') dumped by json converts to a string when loaded in yaml.

myd = {'a': float('NaN')}
j = json.dumps(myd)
y = yaml.dump(myd)
y
'{a: .nan}\n'
j
'{"a": NaN}'
yaml.version
'3.12'
yaml.safe_load(j)
{'a': 'NaN'}
yaml.safe_load(y)
{'a': nan}

I'm still working on the YAMLHandler. I'm refactoring the ContentHandler base class to provide a load_data_file method that returns python data to get around this issue.

@cdent
Copy link
Owner

cdent commented Jul 3, 2018

Yeah, it's pretty clear that despite the rumors, at least in Python, JSON is not a strict subset of YAML and you can't load it with complete accuracy. I think this makes it a pretty clear that using as the loader needs to be something that is opt-in, and the way we do that with ContentHandlers is by having others.

I agree that having an overridable load_<something> method of some kind is a good way to go.

Will look through the current patches to see what else is up. Thanks for plugging away at this.

@@ -54,7 +55,7 @@ def dumps(data, pretty=False, test=None):

@staticmethod
def loads(data):
return json.loads(data)
return yaml.safe_load(data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root of the speedbumps you've hit is because you're engaging yaml.safe_load here, instead of in the action method, meaning yaml gets involved in all the JSON processing, not just when you are loading data from disk to be handled in a response_json_paths.

This might be the right thing, but did you consider doing it in action and keeping the behaviors separated?

This patch aims to allow loading yaml files with the use of a
custom handler named YAMLHandler.

NOTE:
  * modifications to suitemaker.py and case.py were needed to delegate
    the handlers to their respective collections because in case.py
    the _assert_response method will iterate over all response_handlers
    which includes every handle (default, response_handlers, and
    content_handlers) because of how they were added when the tests
    were made. This caused the test_yaml_handler tests to fail because
    both JSONHandler and YAMLHandler were called on the test.
Copy link
Owner

@cdent cdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two main issues now are

  • the name: YAMLHandler implies more than what the thing created does (see more within)
  • the adjustments to deal with multiples handlers that are of the same type are likely not in the right place, see the comments within

Thanks again for continuing to plug away at this. The overall concept of a load_data_file file is good. I'll try to find some time to look into how to manage the overlapping content handlers

@@ -22,6 +22,12 @@ string into a data structure in ``response_data`` that is used when
evaluating ``response_json_paths`` entries in a test or doing
JSONPath-based ``$RESPONSE[]`` substitutions.

A YAML content handler has been added to extend the JSON handler.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call it something different than the YAML content handler because what the name YAMLContentHandler means is that this is a handler for request and/or response bodies which are YAML. That's not what this is.

Naming is hard, so I don't have any clever ideas at the moment but will think on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the ugly but explicit route :/ YAMLDiskLoadingJSONHandler . Alternatives could be YDLJSONHandler or YDLJHandler for short? Any other suggestions are welcome too.

gabbi/driver.py Outdated
@@ -105,7 +105,6 @@ def build_tests(path, loader, host=None, port=8001, intercept=None,
for handler in (content_handlers + response_handlers +
handlers.RESPONSE_HANDLERS):
handler_objects.append(handler())

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated?

gabbi/case.py Outdated
handlers = self.response_handlers
content_handler = self.get_content_handler(self.content_type)
if content_handler:
handlers.append(content_handler)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right fix for this. Unfortunately, we're going to need to dig a bit to figure out what the right fix is. The problem will be happening somewhere in the _register handling in the base ResponseHandler.

Part of the reason this is wrong is that this method is supposed to be using response handlers only (of which content handlers are a subclass, but not all responsehandlers are content handlers) so if you have to go looking in get_content_handler than a handler has not _registered properly, or has been clobbered somehow.

I haven't got the cycles to dig into it today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a class attribute to help with this.

@@ -12,7 +12,6 @@
# under the License.
"""Base classes for response and content handlers."""


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreleased (although this is a fine tidy up)

if handler.content_handler:
content_handlers.append(handler.content_handler)
elif handler.response_handler:
response_handlers.append(handler.response_handler)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this is probably not the right fix, as when a handler registers it can declare itself as both a content handler and a response handler and they are not mutually exclusive. For some things (notably the JSONHandler) it is both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a class attribute so a handler can declare itself as just a content handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this change hasn't been reverted yet. Looks like there is still some more work to do on this part.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a branch in progress that tries to address that in a slightly different way. We'll want to compare notes on that and decide which is best. I'll push it up shortly.

Copy link
Contributor Author

@trevormccasland trevormccasland Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll wait on that then. I found there was a typo in jsonhandler.py too that this was covering up.
add_reponse_handling -> add_response_handling

This commit aims to add a class attribute called add_response_handling
to the ResponseHandler. It's purpose is to allow a ContentHandler to
exclude itself from the response_handler collection so that more than
one ContentHandler with the same content-type can run tests correctly.

YAMLHandler was renamed to something pretty long, I think we can
use YDL as an acronym for YAMLDiskLoading if we don't like the length.
@@ -149,4 +153,4 @@ passed the ``response_data`` of the prior test and the argument within the
Please see the `JSONHandler source`_ for additional detail.

.. _JSONHandler source: https://github.com/cdent/gabbi/blob/master/gabbi/handlers/jsonhandler.py
.. _YAMLHandler tests: https://github.com/cdent/gabbi/blob/master/gabbi/tests/test_yaml_handler.py
.. _YAMLDiskLoadingJSONHandler tests: https://github.com/cdent/gabbi/blob/master/gabbi/tests/test_yaml_disk_loading_jsonhandler.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being used anywhere :/


response_json_paths:
$: @<subdir/values.yaml

Examples like this can be found in one of gabbi's `own tests`_.

When reading from disk you can apply the same JSONPath by adding a ':' to the
end of your file name. This allows you to store multiple API responses into
a JSON or YAML file to reduce file management when constructing your tests.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may just have to read as '...multiple API responses into a JSON file...'

@cdent
Copy link
Owner

cdent commented Jul 8, 2018

Writing just to ack that I've had a look at this, but will need to look more closely tomorrow to more fully understand the added class attribute.

Also, whatever is going on with the failing pytest-driven tests needs to be addressed.

Thanks for plugging away.

@cdent
Copy link
Owner

cdent commented Jul 9, 2018

I'm going to try some experiments myself. I think one of the issues here is that the new subclassed handler should replace the json handler entirely, but I'm struggling to trace the logic in my brain so I need to hack on the code. I'll post back here when I've had some more concrete thoughts.

Tests with the same filename were giving weird connection errors
when running the tests[1]

[1] https://travis-ci.org/cdent/gabbi/jobs/400962655
@cdent
Copy link
Owner

cdent commented Jul 9, 2018

That need for different named flags is almost certainly a bug in the pytest handling. Good find. I'll try to fix that.

cdent added a commit that referenced this pull request Jul 9, 2018
This is an alternate version of pull #254, with the difference
in gabbi/suitemaker.py where response_handlers is only extended
if the response_* key has not been seen before. This means that
if the YAMLDiskLoadingJSONHandler is turned on it does all the
work of the JSONHandler for respone handling, because it is only
present once.

At the same time, the existing handing for content handlers means
that custom handlers which accept a content-type are used first (and
only) for the content type it accepts.

The issues with pytest and files having the same name remain
here. I tried to make some change to pytest test loading to
make the global cache use more distinguishing keys, but it was
distracting from progressing on the other work, so I'll deal with
that separately.
this patch aims to integrate the changes made in[1] revert changes
related to the class attribute 'add_response_handler' in[2] and
add unit tests to cover the suitemaker logic from[1].

[1]cdent#256
[2]cdent@241684e
for handler in handlers:
default_test_dict.update(handler.test_base)
if handler.response_handler:
key = 'response_%s' % handler.test_key_suffix
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since key is getting thrown away outside this look, constructing 'key' from 'response_%s' is probably not necessary (as it will be done for every suffix). Should be okay to use handler.test_key_suffix directly in the set.

@cdent
Copy link
Owner

cdent commented Jul 16, 2018

This is looking good. I added one more comment on a small thing that could be fixed to make it a bit more futureproof.

Thanks!

uses handler's test_key_suffix property rather than trying to
recreate its private _key member's value.
@cdent cdent merged commit bc20dac into cdent:master Jul 16, 2018
@cdent
Copy link
Owner

cdent commented Jul 16, 2018

This is released as 1.44.0

@trevormccasland If you get the opportunity to write a blog post or something like that about how you are using gabbi, especially the features you added, that would be great. Since people rarely read the docs all the way through, only something that does a bit of "we wanted to do this, so we added these features" kind of explanation will really get the idea across.

Thanks again.

@trevormccasland trevormccasland deleted the yaml-from-disk branch July 16, 2018 21:51
@trevormccasland
Copy link
Contributor Author

Thanks a lot for the reviews and working through the pull request with me. I will work on a blog post as soon as we get everything together on our side. Then I'll post the link here when it's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jsonhandler: allow reading yaml data from disk
3 participants