Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always return Unicode strings #85

Open
sampsyo opened this issue Jan 29, 2013 · 5 comments
Open

Always return Unicode strings #85

sampsyo opened this issue Jan 29, 2013 · 5 comments

Comments

@sampsyo
Copy link
Collaborator

sampsyo commented Jan 29, 2013

I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:

>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'

The recording title, which has a "special" character in it, is a unicode object. The release title, which is all ASCII, is a str object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode objects.

Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)

@ghost ghost assigned sampsyo Jan 29, 2013
@alastair
Copy link
Owner

This is a good idea, I can't remember if we talked about it. The fact that elementtree returns both is annoying. My preference would be for a helper method to use in the parse_element methods - either explicit or a decorator.

@alastair
Copy link
Owner

alastair commented Feb 6, 2014

Moving this to apichange. Do we think that it's an incompatible change, or can we just do it?

@alastair alastair added this to the apichange milestone Feb 6, 2014
@JonnyJD
Copy link
Collaborator

JonnyJD commented Feb 6, 2014

I consider this an apichange.

People might run into problems expecting a bytestring at some point, trying to do some decoding and failing since you can't decode a (non-ascii) unicode string.

Example in mind:

$ python2
>>> "blå".decode('utf8')
u'bl\xe5'
>>> unicode("blå").decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Something like that.
Basically hurting users that tries to handle the shortcommings and using unicode everywhere in python2.
I didn't check yet how isrcsubmit would handle that, but it would probably do fine since I do check if I have unicode or bytes everywhere.

@alastair
Copy link
Owner

alastair commented Feb 6, 2014

OK, good reason to hold off for apichange. Thanks.

@snejus
Copy link

snejus commented Aug 4, 2024

Just checked this now and it seems like this issue is not coming up anymore, so this issue may be closed I think

[ins] In [13]: rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]

[ins] In [14]: rec["title"]
Out[14]: '❦ (Piano)'

[ins] In [15]: rec["release-list"][0]["title"]
Out[15]: 'An Awesome Wave'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants