Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with non-ASCII characters in Buffer and Stream attachments #255

Closed
wants to merge 2 commits into from

Conversation

simondean
Copy link
Contributor

This fixes #249. Buffers are not converted to strings using buffer.toString('binary') instead of buffer.toString(). buffer.toString() defaults to utf8 which broken any non-ASCII characters.

This pull request also includes a fix for an issue in features/attachments.feature that meant the node.js version wasn't detected correctly and meant that streams weren't being tested in the feature file (they were still being tested in the specs).

@simondean
Copy link
Contributor Author

The Travis CI build for this pull request is currently failing due to #254.

@simondean
Copy link
Contributor Author

I plan to do a rebase on this pull request once #256 (the fix for the node.js v0.8 npm upgrade issue) is merged.

@simondean
Copy link
Contributor Author

I've rebased the pull request and the Travis CI build is now passing (no more npm upgrade issue)

@simondean
Copy link
Contributor Author

Hi @jbpros. Would it be possible for someone to take a look at this pull request sometime? Unfortunately attachments are pretty useless without this fix as without you can't attach any binary attachments (e.g. screenshots) or anything else containing non-ASCII characters (e.g. UTF-8 encoded text). Thanks

data.on('end', function() {
astTreeWalker.attach(Buffer.concat(buffers).toString(), mimeType);
astTreeWalker.attach(Buffer.concat(buffers).toString('binary'), mimeType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the buffer is always going to be binary, Is this a safe assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just pushed a commit that adds some additional tests around this. The commit also fixes a bug I just found with attaching strings that contain non-ASCII characters.

Best I can tell from http://nodejs.org/api/buffer.html Buffer is only designed to handle arrays of octals (bytes); just search for the word octet on that page to see what I mean. Which would mean it is safe to assume that a Buffer only ever contains binary. What sort of binary it contains (e.g. raw binary or a UTF-8 encoded string) depends on what constructor if Buffer you use when creating the Buffer you pass into Scenario.attach().

Looking at the encode64s function in https://github.com/cucumber/gherkin/blob/master/js/lib/gherkin/formatter/json_formatter.js it assumes that the string you pass as an attachment to the JSONFormatter only contains 8-bit characters; basically it can only base64 encode binary. Which makes sense as you can only base64 encode binary.

Cucumber only seems to store a MIME type (aka media type) with attachments/embeddings. It doesn't store the character encoding. Which means if you attach text, you have no idea from the output JSON what encoding the text is in. cucumber/cucumber-jvm#501 seems to be related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @samccone. Did that answer your question?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simondean Reading the docs here I see a note below that means 'binary' encoding will be removed in future versions of Node. Is it safe to use it here?

" 'binary' - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is deprecated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you thing about using the hex encoding since the binary encoding appears to be deprecated?

@jbpros jbpros added this to the 0.5 major features milestone Dec 22, 2014
@simondean
Copy link
Contributor Author

@nikulkarni currently the encode64s function in https://github.com/cucumber/gherkin/blob/master/js/lib/gherkin/formatter/json_formatter.js requires the input argument to be in binary format. If the encode64s function was changed to use the Buffer class, it would make it node.js specific and would no longer support running in a browser. There's probably a good case for node.js to keep the binary format as it makes it easier to interop between node.js and other JavaScript platforms.

@nikulkarni
Copy link

@simondean understood, thanks. Hopefully this PR gets merged soon, it fixes attachment of screenshots unless you pass correctly encoded image.

@jbpros jbpros removed this from the major features milestone Oct 10, 2015
@charlierudolph
Copy link
Member

Closing as with 5fb6706 there is no intermediate conversion to string and we just encode it as base64 when adding it to the json output.

@lock
Copy link

lock bot commented Oct 25, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot attach non string data to report
5 participants