Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random arithmetic_exception - "long overflow" errors #5713

Open
soltmar opened this issue Jan 5, 2023 · 4 comments
Open

Random arithmetic_exception - "long overflow" errors #5713

soltmar opened this issue Jan 5, 2023 · 4 comments
Assignees
Labels
bug Something isn't working v2.19.0 Issues and PRs related to version 2.19.0

Comments

@soltmar
Copy link

soltmar commented Jan 5, 2023

Hi,

I'm receiving following errors randomly in OpenSearch response:

"_shards" : {
    "total" : 5,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 4,
    "failures" : [
      {
        "shard" : 3,
        "index" : "my_index",
        "node" : "Ed3vRYdSRdaHyhkzhtuXyQ",
        "reason" : {
          "type" : "arithmetic_exception",
          "reason" : "long overflow"
        }
      }
    ]
  },
...

With below sort request:

GET my_index/_search
{
        "track_total_hits": true,
        "query":
        {
            "bool":
            {
                "must_not":
                [
                    {
                        "term":
                        {
                            "status_id": 8
                        }
                    }
                ]
            }
        },
        "size": "50",
        "sort":
        {
            "reminder_date": {
              "missing": "_last",
              "order": "asc"
            }
        }
}

To give it a little bit of context:

  • Tested on OpenSearch v1.3 and v2.3 index
  • over 70 000 documents.
  • reminder_date is date field type.
  • I have "dynamic_date_formats": "dd/MM/yyyy" set in mappings.
  • some documents have "reminder_date" set to null

When running above query sometimes OpenSearch returns no errors, but also it sometimes fails with : "arithmetic_exception".

Running the same query, almost every time gives different number of results and failing shards.

Let me know if you need any further info.
Thanks

@dblock
Copy link
Member

dblock commented Jan 5, 2023

Do you have an error stack in the logs? A complete error response?

@dblock dblock added the bug Something isn't working label Jan 5, 2023
@soltmar
Copy link
Author

soltmar commented Jan 6, 2023

I'm using OpenSearch on AWS. I have error logs enabled (Sent to CloudWatch) but there are no logs related to that search query.

Are you aware of any way to get these logs on AWS ? Or maybe in the query result itself ?

Btw. I'm always getting some hits back but not full number of them when "failures" key is present.
Also, when missing is set to _first in sort element it all works fine so look like it's only related to _last flag.
I did notice that sort element on results where reminder_date is null got below value (not sure if it helps or not):

....
"sort" : [
          -9223372036854775808
        ],
...

When it is ok I got this:

{
  "took" : 103,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 74090,
      "relation" : "eq"
    },
...
}

And this when some shards are failing:

{
  "took" : 53,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 3,
    "failures" : [
      {
        "shard" : 1,
        "index" : "my_index",
        "node" : "YCEI0PpITD6h-xlV3udp-g",
        "reason" : {
          "type" : "arithmetic_exception",
          "reason" : "long overflow"
        }
      }
    ]
  },
  "hits" : {
    "total" : {
      "value" : 29482,
      "relation" : "eq"
    },
    ....
   }
}

@soltmar
Copy link
Author

soltmar commented Jan 6, 2023

I think I may have something.
Since I did set allow_partial_search_results=false I've started to receive logs:

Caused by: java.lang.ArithmeticException: long overflow
	at __PATH__(Math.java:949)
	at __PATH__(Math.java:925)
	at __PATH__(Instant.java:1236)
	at org.opensearch.index.mapper.DateFieldMapper$Resolution$1.convert(DateFieldMapper.java:106)
	at org.opensearch.index.mapper.DateFieldMapper$DateFieldType.parseToLong(DateFieldMapper.java:510)
	at org.opensearch.index.mapper.DateFieldMapper$DateFieldType.isFieldWithinQuery(DateFieldMapper.java:548)
	at org.opensearch.search.sort.FieldSortBuilder.isBottomSortShardDisjoint(FieldSortBuilder.java:481)
	at org.opensearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:549)
	at org.opensearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:531)
	at org.opensearch.index.query.Rewriteable.rewrite(Rewriteable.java:83)
	at org.opensearch.search.SearchService.canMatch(SearchService.java:1323)
	at org.opensearch.search.SearchService$2.onResponse(SearchService.java:472)
	... 121 more

I have also found this on ES elastic/elasticsearch#52396 I think it may be similar problem

@dblock
Copy link
Member

dblock commented Jan 9, 2023

That's definitely something! I would continue narrowing down to a 100% repro, maybe on a local instance with the same mapping? And at the same time would try to write a unit test that calls whatever is in DateFieldMapper.java:106 with that value you see in the sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2.19.0 Issues and PRs related to version 2.19.0
Projects
None yet
Development

No branches or pull requests

3 participants