Skip to content
This repository was archived by the owner on Nov 29, 2021. It is now read-only.

Add handling for keyword fields with a maximum length of 1024 characters #103

Merged
merged 4 commits into from
Nov 7, 2019

Conversation

FreezeWarp
Copy link
Contributor

@FreezeWarp FreezeWarp commented Oct 11, 2019

When the ElasticSearch APM agent sends values for certain fields that are longer than 1024 characters, an exception is thrown:

{
  "accepted": 12,
  "errors": [
    {
      "message": "error validating JSON document against schema: I[#] S[#] doesn't validate with \"transaction#\"\n  I[#] S[#/allOf/3] allOf failed\n    I[#/context/request/url/search] S[#/allOf/3/properties/context/properties/request/properties/url/properties/search/maxLength] length must be \u003c= 1024, but got 1411",
      "document": "{\"transaction\":{\"trace_id\":\"e11c7e8a9f59efddc48537ec80515602\",\"id\":\"cbf74982abf1f97b\",\"parent_id\":null,\"type\":\"generic\",\"duration\":3625.9340000000002,\"timestamp\":1571250468353990,\"result\":\"200\",\"name\":\"[omitted]\",\"context\":{\"request\":{\"http_version\":\"\\/2.0\",\"method\":\"GET\",\"socket\":{\"remote_address\":\"169.10.176.228\",\"encrypted\":true},\"response\":{\"finished\":true,\"headers_sent\":true,\"status_code\":200},\"url\":{\"protocol\":\"https\",\"hostname\":\"[omitted]\",\"port\":\"443\",\"pathname\":\"\\/index.php\",\"search\":\"[a very long, omitted value]\",\"full\":\"[a very long, omitted value]\"},\"headers\":{\"user-agent\":\"Mozilla\\/5.0 (X11; Fedora; Linux x86_64) AppleWebKit\\/537.36 (KHTML, like Gecko) Chrome\\/71.0.3578.98 Safari\\/537.36\",\"cookie\":\"[omitted]\"},\"env\":{\"SERVER_SOFTWARE\":\"nginx\\/1.13.6\"}}}},\"sampled\":null,\"span_count\":{\"started\":0,\"dropped\":0}}}"
    }
  ]
}

The particular implementation here is as follows:
mb_substr is used to perform substring extraction; ElasticSearch uses UTF-8 exclusively, so this is the encoding given to mb_substr. In addition, in order to check string length, strlen is first used (it is constant time), and only if strlen exceeds the character limit is mb_substr used (which is linear time).

The keyword handing is added to most of the locations it is found in the Python agent (https://github.com/elastic/apm-agent-python/search?q=keyword_field&unscoped_q=keyword_field); a few are omitted where it doesn't seem to make any reasonable sense (e.g. the PHP version, and exceptions codes; while both technically can exceed the character limit, they should really only do so in extremely atypical situations).

The keyword handling was also omitted from tag names (simply because I wasn't 100% confident in modifying that code).

mb_substr is used to perform substring extraction; ElasticSearch uses UTf-8 exclusively, so this is the encoding given to mb_substr. In addition, in order to check string length, strlen is first used (it is constant time), and only if strlen exceeds the character limit is mb_substr used (which is linear time).

The keyword handing is added to most of the locations it is found in the Python agent (https://github.com/elastic/apm-agent-python/search?q=keyword_field&unscoped_q=keyword_field); a few are omitted where it doesn't seem to make any reasonable sense (e.g. the PHP version, and exceptions codes; while both technically can exceed the character limit, they should really only do so in extremely atypical situations).

The keyword handling was also omitted from tag names (simply because I wasn't 100% confident in modifying that code).
Copy link
Owner

@philkra philkra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about missing test

/*
* Functions to convert values for transmission to ElasticSearch.
*/
class Encoding
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test for this class, thanks in advance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added PhilKra\Tests\Helper\EncodingTest with commit 8de0b22. Let me know if this looks good.

Regards,
JTP

@FreezeWarp FreezeWarp requested a review from philkra October 17, 2019 21:26
Copy link
Owner

@philkra philkra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@philkra philkra merged commit 076d9c4 into philkra:master Nov 7, 2019
@philkra
Copy link
Owner

philkra commented Nov 7, 2019

thank you for your contribution 💪

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants