Always return Unicode strings #85

sampsyo · 2013-01-29T19:11:41Z

I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:

>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'

The recording title, which has a "special" character in it, is a unicode object. The release title, which is all ASCII, is a str object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode objects.

Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)

The text was updated successfully, but these errors were encountered:

alastair · 2013-03-11T21:24:17Z

This is a good idea, I can't remember if we talked about it. The fact that elementtree returns both is annoying. My preference would be for a helper method to use in the parse_element methods - either explicit or a decorator.

alastair · 2014-02-06T15:23:11Z

Moving this to apichange. Do we think that it's an incompatible change, or can we just do it?

JonnyJD · 2014-02-06T15:36:37Z

I consider this an apichange.

People might run into problems expecting a bytestring at some point, trying to do some decoding and failing since you can't decode a (non-ascii) unicode string.

Example in mind:

$ python2
>>> "blå".decode('utf8')
u'bl\xe5'
>>> unicode("blå").decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Something like that.
Basically hurting users that tries to handle the shortcommings and using unicode everywhere in python2.
I didn't check yet how isrcsubmit would handle that, but it would probably do fine since I do check if I have unicode or bytes everywhere.

alastair · 2014-02-06T15:42:13Z

OK, good reason to hold off for apichange. Thanks.

snejus · 2024-08-04T14:00:17Z

Just checked this now and it seems like this issue is not coming up anymore, so this issue may be closed I think

[ins] In [13]: rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]

[ins] In [14]: rec["title"]
Out[14]: '❦ (Piano)'

[ins] In [15]: rec["release-list"][0]["title"]
Out[15]: 'An Awesome Wave'

ghost assigned sampsyo Jan 29, 2013

alastair added this to the apichange milestone Feb 6, 2014

JonnyJD added improvement API change labels Jun 11, 2015

alastair mentioned this issue Dec 14, 2016

Requests reloaded #199

Open

bal-e mentioned this issue Aug 4, 2024

Remove redundant unicode decoding beetbox/beets#5379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always return Unicode strings #85

Always return Unicode strings #85

sampsyo commented Jan 29, 2013

alastair commented Mar 11, 2013

alastair commented Feb 6, 2014

JonnyJD commented Feb 6, 2014

alastair commented Feb 6, 2014

snejus commented Aug 4, 2024 •

edited

Loading

Always return Unicode strings #85

Always return Unicode strings #85

Comments

sampsyo commented Jan 29, 2013

alastair commented Mar 11, 2013

alastair commented Feb 6, 2014

JonnyJD commented Feb 6, 2014

alastair commented Feb 6, 2014

snejus commented Aug 4, 2024 • edited Loading

snejus commented Aug 4, 2024 •

edited

Loading