-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with non-ASCII characters in Buffer and Stream attachments #255
Conversation
The Travis CI build for this pull request is currently failing due to #254. |
I plan to do a rebase on this pull request once #256 (the fix for the node.js v0.8 npm upgrade issue) is merged. |
40b7321
to
53a380d
Compare
I've rebased the pull request and the Travis CI build is now passing (no more npm upgrade issue) |
Hi @jbpros. Would it be possible for someone to take a look at this pull request sometime? Unfortunately attachments are pretty useless without this fix as without you can't attach any binary attachments (e.g. screenshots) or anything else containing non-ASCII characters (e.g. UTF-8 encoded text). Thanks |
data.on('end', function() { | ||
astTreeWalker.attach(Buffer.concat(buffers).toString(), mimeType); | ||
astTreeWalker.attach(Buffer.concat(buffers).toString('binary'), mimeType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the buffer is always going to be binary, Is this a safe assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just pushed a commit that adds some additional tests around this. The commit also fixes a bug I just found with attaching strings that contain non-ASCII characters.
Best I can tell from http://nodejs.org/api/buffer.html Buffer
is only designed to handle arrays of octals (bytes); just search for the word octet
on that page to see what I mean. Which would mean it is safe to assume that a Buffer
only ever contains binary. What sort of binary it contains (e.g. raw binary or a UTF-8 encoded string) depends on what constructor if Buffer
you use when creating the Buffer
you pass into Scenario.attach()
.
Looking at the encode64s
function in https://github.com/cucumber/gherkin/blob/master/js/lib/gherkin/formatter/json_formatter.js it assumes that the string you pass as an attachment to the JSONFormatter
only contains 8-bit characters; basically it can only base64 encode binary. Which makes sense as you can only base64 encode binary.
Cucumber only seems to store a MIME type (aka media type) with attachments/embeddings. It doesn't store the character encoding. Which means if you attach text, you have no idea from the output JSON what encoding the text is in. cucumber/cucumber-jvm#501 seems to be related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @samccone. Did that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simondean Reading the docs here I see a note below that means 'binary' encoding will be removed in future versions of Node. Is it safe to use it here?
" 'binary' - A way of encoding raw binary data into strings by using only the first 8 bits of each character. This encoding method is deprecated and should be avoided in favor of Buffer objects where possible. This encoding will be removed in future versions of Node."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you thing about using the hex
encoding since the binary
encoding appears to be deprecated?
@nikulkarni currently the encode64s function in https://github.com/cucumber/gherkin/blob/master/js/lib/gherkin/formatter/json_formatter.js requires the input argument to be in binary format. If the encode64s function was changed to use the Buffer class, it would make it node.js specific and would no longer support running in a browser. There's probably a good case for node.js to keep the binary format as it makes it easier to interop between node.js and other JavaScript platforms. |
@simondean understood, thanks. Hopefully this PR gets merged soon, it fixes attachment of screenshots unless you pass correctly encoded image. |
Closing as with 5fb6706 there is no intermediate conversion to string and we just encode it as base64 when adding it to the json output. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This fixes #249. Buffers are not converted to strings using
buffer.toString('binary')
instead ofbuffer.toString()
.buffer.toString()
defaults toutf8
which broken any non-ASCII characters.This pull request also includes a fix for an issue in features/attachments.feature that meant the node.js version wasn't detected correctly and meant that streams weren't being tested in the feature file (they were still being tested in the specs).