Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error uploading resource with non utf-8 bytes #137

Closed
edrex opened this issue Apr 11, 2015 · 5 comments
Closed

Error uploading resource with non utf-8 bytes #137

edrex opened this issue Apr 11, 2015 · 5 comments

Comments

@edrex
Copy link

edrex commented Apr 11, 2015

Example using tank helper script from https://gist.github.com/edrex/8aaea3f453fe46237c1f

tank PUT /bags/mgsd/tiddlers/app Content-Type:'text/html' < a_file_with_non_utf8_bytes.html

HTTP/1.1 400 Bad Request
Connection: close
Content-Encoding: gzip
Content-Length: 1245
Content-Type: text/html; charset=utf-8
Date: Sat, 11 Apr 2015 17:35:35 GMT
Server: Apache/2.2.22 (Debian)
Vary: Accept-Encoding

Drag and drop upload through browser:

screen shot 2015-04-11 at 10 37 14 am

HTTP/1.1 500 Internal Server Error
Date: Sat, 11 Apr 2015 17:36:50 GMT
Server: Apache/2.2.22 (Debian)
Access-Control-Allow-Origin: https://tank.peermore.com
Access-Control-Expose-Headers: ETag, Content-Type, X-Tank-Key
Access-Control-Allow-Credentials: true
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 563
Connection: close
Content-Type: text/plain; charset=UTF-8

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/httpexceptor/__init__.py", line 58, in __call__
    return self.application(environ, start_response)
  File "/home/cdent/tiddlywebs/tank.peermore.com/tiddlywebplugins/tank/httperror.py", line 25, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/wsgi.py", line 190, in __call__
    output = self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/serve.py", line 158, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/serve.py", line 117, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/query.py", line 44, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/wsgi.py", line 128, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/extractor.py", line 34, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/wsgi.py", line 38, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlyweb/web/negotiate.py", line 30, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlywebplugins/cors.py", line 87, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlywebplugins/csrf.py", line 93, in __call__
    return app()
  File "/usr/local/lib/python2.7/dist-packages/tiddlywebplugins/csrf.py", line 75, in app
    output = self.application(environ, fake_start_response)
  File "/usr/local/lib/python2.7/dist-packages/selector.py", line 137, in __call__
    return app(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/tiddlywebplugins/utils.py", line 87, in _require_role
    return handler(environ, start_response, *args, **kwds)
  File "/home/cdent/tiddlywebs/tank.peermore.com/tiddlywebplugins/tank/closet.py", line 47, in closet
    target_name)
  File "/home/cdent/tiddlywebs/tank.peermore.com/tiddlywebplugins/tank/closet.py", line 93, in _regular_tiddler
    tiddler.text = content.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xab in position 252830: invalid start byte

I'm not sure if there's a bug here.

  • Should non utf-8 resources be accepted?
  • Should the 400 provide a more informative message?

Unrelated: 400s always come back as HTML. Should they respect Accept: text/plain?

@cdent
Copy link
Owner

cdent commented Apr 12, 2015

Should non utf-8 resources be accepted?

Generally speaking TiddlyWeb takes a pretty hard line about encoding. It expects text-based content to be utf-8 encoded. There are a few different reasons for this:

  • it makes handling on the server side easier
  • utf-8 can represent all(?) of unicode, so why bother with anything else
  • it's sort of a version of activism (or maybe just laziness)

Should the 400 provide a more informative message?

Yes, was there any message in the body of the 400 response?

Unrelated: 400s always come back as HTML. Should they respect Accept: text/plain?

There's been discussion in the past about doing content negotiation for error responses and the general consensus was it wasn't worth it. In default TiddlyWeb the responses are text/plain, which seems to align okay with what you would expect when using the system mostly as an API and where the status codes are the most meaningful part of the response. Tank overrides this behavior to templatize 400 responses into HTML because that's the level at which the errors matter: when using the API.

It would be possible (probably without too much difficulty) to get it to attend to the accept header but are the semantics there appropriate: Accept is used when things go well to say "I want the resource I've requested to come back as X". Does it also apply for error messages?

In any case, the fact that you're getting an untrapped 500 on drag and drop is a bug, that bit of "closet" code is not being careful when making a call to decode. The expected response there would have been the 400 that you're seeing elsewhere. I'll look into fixing that and in the process see whether the 400 is getting the message it deserves.

We can explore accepting other encodings if you like, but I'd rather people put their files in a reasonable encoding in the first place. What do you think?

/cc @FND

@cdent cdent closed this as completed in 2bd6ca9 Apr 12, 2015
@edrex
Copy link
Author

edrex commented Apr 12, 2015

I agree with all this (UTF-8 activism, semantic ambiguity of interpreting Accept for error responses).

My main issue was that the 400 response didn't say why the request was bad.

Old response text:

    <div class="message"> </div>
    <div class="extra">
        You left something out of that request, perhaps go back and
        try again.
     </div>

After 2bd6ca9:

    <div class="message">
      unable to decode tiddler, utf-8 expected: 'utf8' codec can't decode byte 0xab in position 252830: invalid start byte
    </div>
    ...

And the DND upload is correctly throwing a 400 now.

For me this issue seems resolved 🍹


WRT content negotiation for error messages, the original HTTP/1.1 RFC is fairly clear:

Any response containing an entity-body MAY be subject to negotiation, including error responses.

while the newer HTTP semantics RFC is more coy:

When responses convey payload information, whether indicating a
   success or an error, the origin server often has different ways of
   representing that information.

My feeling is that the server should return HTML errors only if the request has Accept: text/html, since browsers always send this in the initial request. The main reason is that otherwise extracting a detailed error message is difficult (eg for display in a client-side UI as in the DND component above).

@edrex
Copy link
Author

edrex commented Apr 12, 2015

I guess to more directly address the semantics of Accept, I think sometimes it means "this is how I would like the resource" (especially with GET requests) and other times it means "these are my preferred message formats for response messages" (for PUT, POST, DELETE, etc).

@FND
Copy link
Contributor

FND commented Apr 13, 2015

My feeling is that the server should return HTML errors only if the request has Accept: text/html

That seems like a sensible argument. To me, Accept not only means "this is how I would like the resource", but also "these are the formats I can parse" - so if we send HTML even though plain text would be just as expressive, that's kind of nasty. (I'm not advocating parsing error messages themselves, of course, but the client might want to pass them through to the UI, logs etc., so HTML is just noise there.)

@cdent
Copy link
Owner

cdent commented Apr 13, 2015

So it appears the outcome of this is that it would be nice if tank only sent text/html on errors some of the time. So I've created #138 as a reminder for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants