-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why Latin1 for report tables? #28
Comments
Shouldn't be an issue to use whatever charset you prefer. Have you tried? |
Thx for the info. I will give it a try and come back with the results |
One potential issue i can think of is protobuf-c implementation on the client side. In any case - curious to see if it works for you, if it doesn't - we'll investigate. |
I did test this locally with "simple" utf8 script such as 'κόσμε', and it works. The Pinba2 server in use was the official container from this project. The client-side were the php pinba extension (ver. 1.1.1) from Ubuntu Focal, and one pure-php protobuf implementation. I created the reports table as CREATE TABLE `report_by_script_name` (
`script` varchar(64) NOT NULL,
...
) ENGINE=PINBA DEFAULT CHARSET=utf8 COMMENT='v2/request/60/~script/no_percentiles/no_filters'; And I had to add in the code querying its data: |
ps: about NULL chars: indeed, if I add |
Ok, great to know. |
WDYT about changing the sql files which are part of this repo? At the very least we could add a comment in them mentioning Latin1 is not a requirement... |
Sure, happy to accept a PR with utf8 everywhere in sql files. |
Sure, will do. Btw, I checked the "raw message" being sent when there is a NUL byte in the middle of the php string used to indicate the script_name, using Wireshark as protobuf dissector:
So it seems that the code dropping those packets might be on the server-side (I will test that again with Pinba2, last tests were ran against Pinba1) |
Further testing disproved the above. |
I kind of assumed that pinba and pinba2 engines should treat strings with null-bytes in the middle differently, attributing to slightly different ways they unpack incoming protobuf packets. Thank you for the PR's, have been very busy lately, will review asap. |
This is possible. What I did verify is that, for the pinba php extension, the truncation happens on the client side - possibly within code generated by the protoc compiler - so in those tests the NUL char was not sent to the engine at all. As for being able to send a NUL char as part of the protobuffer packet: I did manage to do that by using my own php implementation of the client, but the results were mixed. I tried to decode the raw protobuf packet to check its contents: using
Not to be pedantic, but the NUL character is valid in utf8, just as it is valid in ascii. It is just C strings that have issues with that ;-) |
Is it ok to use eg. utf8 charset when defining the reports tables? It would seem to make sense, given the script/hostname data present in them
The text was updated successfully, but these errors were encountered: