Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nasty crash when there are non-UTF-8 characters in /etc/hosts #153

Closed
bortzmeyer opened this issue Sep 2, 2024 · 7 comments
Closed

Nasty crash when there are non-UTF-8 characters in /etc/hosts #153

bortzmeyer opened this issue Sep 2, 2024 · 7 comments
Labels

Comments

@bortzmeyer
Copy link

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

  File "/usr/local/bin/vpn-slice", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/__main__.py", line 625, in main
    do_post_connect(env, args)
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/__main__.py", line 290, in do_post_connect
    providers.hosts.write_hosts(host_map, args.name)
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/posix.py", line 79, in write_hosts
    lines = hostf.readlines()
            ^^^^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 775: invalid continuation byte

After finding ( perl -ne 'print if /[^[:ascii:]]/' hosts ) and deleting them, it works. Ideally, I would like the file to be processed even if there are non-ASCII characters. Otherwise, a better error message would be nice.

@gmacon gmacon added the bug label Sep 2, 2024
@dlenski
Copy link
Owner

dlenski commented Sep 2, 2024

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où 'é'.encode('iso-8859-1').decode('utf-8')UnicodeDecodeError)

As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.

@dlenski
Copy link
Owner

dlenski commented Sep 2, 2024

Ideally, I would like the file to be processed even if there are non-ASCII UTF-8 characters. Otherwise, a better error message would be nice.

Want to write up a PR to process the /etc/hosts file as a binary file rather than a text file with an implied/assumed encoding? https://github.com/dlenski/vpn-slice/blob/master/vpn_slice/posix.py#L94-L116

@gmacon gmacon changed the title Nasty crash when there are non-ASCII characters in /etc/hosts Nasty crash when there are non-UTF-8 characters in /etc/hosts Sep 3, 2024
@gmacon
Copy link
Collaborator

gmacon commented Sep 3, 2024

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

@bortzmeyer
Copy link
Author

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

Indeed, but the composed characters were in the comments.

@bortzmeyer
Copy link
Author

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où 'é'.encode('iso-8859-1').decode('utf-8')UnicodeDecodeError)

As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.

You're right, if I use UTF-8 (like everyone should do), it works.

@dlenski
Copy link
Owner

dlenski commented Sep 4, 2024

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

Indeed, but the composed characters were in the comments.

Thanks, good clarifying question by @gmacon! This makes sense.

#153 (comment) should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?

diff --git a/vpn_slice/posix.py b/vpn_slice/posix.py
index ca267cd..531e42c 100644
--- a/vpn_slice/posix.py
+++ b/vpn_slice/posix.py
@@ -99,14 +99,13 @@ class HostsFileProvider(HostsProvider):

     def write_hosts(self, host_map, name):
         tag = f'vpn-slice-{name} AUTOCREATED'
-        with open(self.path, 'r+') as hostf:
+        with open(self.path, 'r+b') as hostf:
             fcntl.flock(hostf, fcntl.LOCK_EX)  # POSIX only, obviously
             lines = hostf.readlines()
-            keeplines = [l for l in lines if not l.endswith(f'# {tag}\n')]
+            keeplines = [l for l in lines if not l.endswith(f'# {tag}\n'.encode())]
             hostf.seek(0, 0)
             hostf.writelines(keeplines)
-            for ip, names in host_map:
-                print(f"{ip} {' '.join(names)}\t\t# {tag}", file=hostf)
+            hostf.writelines(f"{ip} {' '.join(names)}\t\t# {tag}\n".encode() for ip, names in host_map)
             hostf.truncate()
         return len(host_map) or len(lines) - len(keeplines)

dlenski added a commit that referenced this issue Sep 5, 2024
dlenski added a commit that referenced this issue Sep 5, 2024
dlenski added a commit that referenced this issue Sep 5, 2024
dlenski added a commit that referenced this issue Sep 5, 2024
@bortzmeyer
Copy link
Author

#153 (comment) should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?

Yes, perfect. Thanks.

@dlenski dlenski closed this as completed in d1f419a Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants